ICOADS Web information page (Wednesday, 29-Feb-2012 18:52:30 UTC):
US Maury Collection (deck 701; 1784-1863) (by ESRL)
1. Background
Background on the Collection and the format used for digitization is
provided in the following texts from the CD-ROM (NCDC, 1998):
about.txt
format.txt
Updated format information is provided here:
maury_format
Format translation specifications are located here:
maury_transpec
Additional translation-related information is located on these Webpages:
Time Adjustments
Temperature Corrections
Preliminary Inventories and Plots
Inventories of the ship names, and voyages, in the Collection are available
here:
mauri_out
maury_invoy (1.8 MB text file)
As detailed in Table 1, the first 13 years of the Collection contain temporal
discontinuities (missing months and years), and very few data.
Table 1. Numbers of reports and ships present in the earliest years of the US
Maury Collection. Months are frequently composed entirely of reports from a
single ship, and some several and years contain no data (until February 1796,
which is the last missing month).
===============================================================================
Reports Year/Month ID fields (ship names abbreviated to 8 characters)
-------------------------------------------------------------------------------
6 1784/02 EMPRE*_C
26 1784/03 EMPRE*_C
27 1784/04 EMPRE*_C
20 1792/03 GRAND_TU
30 1792/04 GRAND_TU
31 1792/05 GRAND_TU
30 1792/06 GRAND_TU
31 1792/07 GRAND_TU
21 1792/08 GRAND_TU PEGGY
25 1792/09 PEGGY
37 1793/02 GRAND_TU PEGGY
62 1793/03 GRAND_TU PEGGY
51 1793/04 GRAND_TU PEGGY
55 1793/05 GRAND_TU PANTHER PEGGY
55 1793/06 GRAND_TU PANTHER PEGGY
27 1794/07 KATY
5 1794/08 KATY
4 1794/09 KATY
31 1794/10 KATY
12 1794/11 KATY
2 1795/05 YORKTOWN
1 1795/06 YORKTOWN
1 1795/07 YORKTOWN
1 1795/08 YORKTOWN
1 1795/09 YORKTOWN
22 1795/10 YORKTOWN
27 1795/11 YORKTOWN
15 1795/12 YORKTOWN
1 1796/01 YORKTOWN
6 1796/03 ARRABIDA
30 1796/04 ARRABIDA
84 1796/05 ARRABIDA BRIDGEWA
242 1796/06 BRIDGEWA
260 1796/07 BRIDGEWA
170 1796/08 BRIDGEWA
45 1796/09 BRIDGEWA
77 1796/10 BRIDGEWA
21 1796/11 KITTY
26 1796/12 KITTY
1618 TOTAL
-------------------------------------------------------------------------------
The US Maury Collection data as obtained from the CD-ROM contained a large
number of problems, many arising from the difficulty of reading the original
microfilm records, or otherwise introduced during digitization and assembly of
the data. A 5-phase processing was used to address many of these problems
(sec. 2-4).
Some voyage number misassignments still exist in the US Maury data, which could
not be resolved under our schedule constraints (extensive comparisons with the
original microfilm would have been required). In most cases we suspect that
the reports were keyed in proper order (with respect to the original microfilm
sequence), but the voyage number was not updated properly during digitization
(e.g., reports at the start of a new voyage inadvertently received the previous
voyage number). This means that incorrect metadata such as ship name and type
may be attached to some reports, but the basic meteorological data are probably
at the correct times and locations. Also, these problems may have impacted
the results of position interpolation to some extent (as discussed in sec. 4).
2. Processing overview
Five general phases of processing (Fig. 1) were used to help expedite the work
and for related technical reasons (this is a simplification of the actual
processing, which involved additional steps and variations depending on the
form type of the data). Processing through Phase C is done, with Assessment
(Phase D) beginning.
Time edit Time/pos. assign. Translation to LMR Assessment
---------- ----------- ----------- -----------
CD1 data ---A---> CD2 data ---B---> QC data ---C---> LMR data ---->D
---------- ----------- ----------- ^ reject file
^ |____________________________________________| summary
| -----------
Pre-edit
|
---------- Figure 1. Processes (Pre-edit and A-D), and data and metadata
CD data outputs proposed for the US Maury Collection.
----------
In the following overview of the Pre-edit and Phases A-D, the output data
at each stage are represented in one of three formats: CD (format used for
digitization and on the CD-ROM), QC (quality control format), and LMR (LMR6).
"CD" is suffixed by a number to indicate that the data are still in the CD
format, but edited. Further details on Phases A and B are given in secs.
3-4. The files for each processing phase were divided up according to the
original microfilm reel numbers (one file for each of the 85 reels that were
digitized).
Pre-edit
A few changes to the original CD data were required to manipulate the
data on a Unix system. Most significantly the data contained null
characters in place of some real characters. A few additional minor
problems were detected and changes made: for three control numbers,
the headers were not with the data records and were shifted; one
corrupt and redundant header record was deleted; and an erroneous form
type = 5 was changed to form type = 1 for a few records.
Input: CD
Output: CD1
Phase A: Time edit (and other incidental editing)
First a number of modifications were made to the time elements
(records were not moved, and voyage numbers were not changed).
Then changes were made to day and hour to regularize the data
for the 24-hour clock, and to fix apparent problems introduced by the
noon-to-noon definition of day in some early data.
Input: CD1
Output: CD2
Additional information: see sec. 3. Also, since the record structure
has not changed, a diff could be performed between the CD1 and CD2 data
to obtain a complete list of differences.
Phase B: Time/position assignment
A condensed QC format forms the output from this process, containing the
edited time elements, the originally reported or interpolated positions,
and other information. If, for example, the interpolation failed to
produce a latitude and longitude, one or both of latitude/longitude was
missing and the report was rejected at the next (translation) phase.
Input: CD2
Output: QC records (1-for-1 with CD1 and CD2)
Additional information: see sec. 4.
Phase C: Translation to LMR
Fields were translated (as feasible) into the regular fields of the LMR
format, plus data from the CD1 and QC records were placed in the
supplemental attachment of each report. Note that we attached the CD1,
rather than CD2, records in order to preserve the more original records
(all the edited elements were provided via the QC records). Temperature
units and other corrections were also made as part of this processing.
Input: CD1 + QC records (1-for-1)
Outputs: Per reel: LMR, reject file, and conversion summary
Phase D: Assessments
Planned to include rechecking of ship tracks, climatological comparisons,
etc.
Input: LMR
Outputs: graphics, etc. (products not planned for archival)
3. Details on time edit (Phase A)
Summary: Approximately four person-months were spent analyzing the voyages
and correcting (almost exclusively) time problems. Some incidental location
and other obvious problems were also corrected, but records were not moved
(with respect to the original data sequence) and control numbers were not
changed. This work, plus other phases, could be reiterated if significant
new problems were discovered in the future.
The input CD data have the following counts:
1,414,198 total lines (data + header records)
- 12,336 header records (voyages)
---------
1,401,862 data records (reports)
20819 records were modified (~1.49%). These can be subdivided
as follows:
Changes to:
------------------------------
year month day hour
13673 2608 4035 1875
0.98% 0.19% 0.29% 0.13%
There were 268 "year jumps" in the original data, i.e., year difference not zero
or one within an (apparent) voyage. Only about 60% of these were corrected (a
subset of the above year corrections); the remainder were not corrected because
they were found to have problems such as control number incorrect (different
voyage/ship shares control number), duplicated reports, or data out-of-order.
Next the hours (and days) in the Collection were regularized according to
the 24-hour clock (local time). The details of this processing are described on
the Time Adjustments Webpage. The Preliminary Inventories and Plots Webpage
provides some additional background information on Phase A problems.
4. Details on time/position assignment (Phase B)
The edited time elements were carried forward in this phase (from CD2 into
the QC format). Also latitude/longitude were carried forward as reported
or interpolated, together with a few other flags and metadata in the QC
format.
Within each voyage, we attempted to interpolate missing latitude/longitude for
any reports in sequence between two reports containing observed values of
latitude/longitude (subject to constraints described below). This was
facilitated by the fact that data in the CD format were organized into
voyages, with the data generally digitized in proper time-sequence within the
digitized files, even after the time edit. One exception was logform pattern
7 (discussed on the Time Adjustments Webpage), in which the reports containing
observed latitude/longitude ended up out of sequence after time edit (special
steps were taken to properly interpolate pattern 7 voyages). When data were
otherwise out of time sequence (e.g., due to header misassignment problems),
interpolation was not performed.
To avoid having to manipulate the entire (fairly voluminous) CD dataset, the
interpolation output consisted of the abbreviated QC format, containing: voyage
number, reel sequence number (with respect to the original microfilm reel),
time (year, month, day, and hour), position (latitude and longitude), and the
LMR lat/lon indicator (LI). LI contained missing (if latitude and longitude
ended up missing), or one of the following values:
3 = interpolated
4 = degrees and minutes
Data were processed one reel (file) at a time. First latitude was interpolated
(for the entire file), and then longitude, to take better advantage of more
frequent reporting of observed latitude than observed longitude (owing to
early navigational constraints). This could result in reports with latitude
originally reported, but longitude interpolated (flagged LI=3). In hindsight,
an additional LI value:
6 = other (refer to metadata)
could have been used to distinguish between the mixture of latitude observed
and longitude interpolated (but the CD1 records are available in LMR format
if it is desired to isolate this case).
Interpolation was performed using simple linear interpolation in each dimension
(latitude or longitude), rather than spherical coordinates (great circle
calculations). Except over large distances this probably produced satisfactory
results (also considering the relatively coarse resolution of early reported
positions).
If lat1 and lat2 (hr1 and hr2) were the two reported latitudes (hours), we
calculated:
dlat = lat2 - lat1
dhr = hr2 - hr1 (using "julian" hours)
Interpolation was not performed if dhr was negative (a jump backwards in time),
if ob1 and ob2 did not have the same voyage number, or if:
dhr > 3 months
|dlat/dhr| > 8 degrees in 24 hours (i.e., > 20 knots)
|dlat| > 32 degrees
Since there are approximately 60 nautical miles or 111 km per degree of
latitude (actually using 1852 m per international nautical mile):
max 24-hour distance = 889 km = 552 miles
max abs value of dlat = 3556 km = 2209 miles
Then for each pair of corresponding longitudes, lon1 and lon2, we calculated:
dlon = lon2 - lon1 (adjusted accordingly if voyage crossed dateline)
wlon = dlon * cos(ylat)
wdis = sqrt(wlon**2 + dlat**2)
where ylat was the mean of lat1 and lat2, lat1 (lat2) if lat2 (lat1) was missing,
or 45 degrees if both were missing. Similarly to latitude, interpolation was
not performed if:
dhr > 3 months
|wlon/dhr| > 8 degrees in 24 hours (i.e., > 20 knots)
|wdis| > 32 degrees
Note that reports lacking latitude and/or longitude were rejected during
the next phase of processing (translation to LMR). Reports with latitude
successfully interpolated, but not longitude (due to stricter tests), were
rejected by this means.
Reference
NCDC (National Climatic Data Center), 1998: The Maury Collection:
Global Ship Observations, 1792-1910 (CD-ROM, Version 1.0, February
1998). NCDC, Asheville, NC
[Documentation and Software][Translation information]
U.S. National Oceanic and Atmospheric Administration hosts the icoads website
privacy
disclaimer
Document maintained by icoads@noaa.gov
Updated: Feb 29, 2012 18:52:30 UTC
http://www.icoads.noaa.gov/maury.html
|