|
The data-storage format employed for this database is Unidata's NetCDF. NetCDF is a general, self-documenting, machine-transportable data format created and supported by University Center for Atmospheric Research (UCAR) (http://www.unidata.ucar.edu/packages/netcdf/). NetCDF was chosen because it is widely used in the climate modeling community, is independent of hardware platform and operating system, and has a variety of helper applications already developed for data access and visualization. NetCDF files are typically made up of variables that contain measurements or computed values and attributes that describe the contents of the file or variables. We employ attribute and variable names from one of the few oceanographic data specifications available in the 1980s, which is called EPIC (Equatorial Pacific Information Collection) (http://www.pmel.noaa.gov/epic/). EPIC was developed by the NOAA Pacific Marine Environmental Laboratory (PMEL) to analyze, manage, and display in situ oceanographic data. By employing EPIC-compliant netCDF, this database may be used by researchers from different organizations without having to translate "foreign" data types into the local vernacular. Using a known vocabulary also enhances the discovery of these data by other computers and incorporation in larger data- aggregation sites. A list of the EPIC keys that may occur in our data is provided in appendix 3, but a single file will only contain a subset of these variables.
One of the advantages of employing netCDF format is that the metadata are stored with the data. A typical netCDF file in this database (.cdf and .nc suffixes) will have global attributes that describe what, where, when and how the data was collected. Global attributes apply to all the variables in the file, while each variable will have attributes that apply to the contents of that specific variable. The mooring number, data start date and time, data end date and time, position, instrument type, and sample rate are all metadata stored in the global attributes. Attributes and their possible values and usage are discussed in the following sections.
Coordinate variables are those used to describe the dimensions of the measurement variables. The files in the USGS database of oceanographic time- series measurements are typically four dimensional, where time, depth, latitude, and longitude are the dimensions (and coordinate variable names). The time dimension corresponds to the number of samples in the file. Sample measurement time (see Data Processing) is computed from coordinate variables named "time" and "time2". Depth may have one or more values (if a vertical current profile was measured at 14 heights above the bed by an ADCP, the depth dimension would have 14 values). Latitude and longitude have single values because our observations are from static platforms, but were defined as dimensions to preserve the option of employing drifting instruments that have time-varying position. Coordinate variables (time, time2, depth, lat, lon, freq, dir) may never have a FillValue_ attribute, because they cannot have gaps.
The netCDF file also contains data variables (the actual measurements) named using EPIC conventions. For example, if the variable contains seawater temperature measurements, it would be called T_28; if it contains east current velocity, it would be called u_1205. FillValue_ is used in data variables to indicate where data were unrecoverable or missing; we set it to 1e35. Attributes associated with each variable describe the units (for example, degrees Celcius, centimeters per second), sensor height on the tripod, data maxima and minima, and the sensor model and serial number that go with the data.
Global Attributes
This section describes usage of the generic global attribute fields in USGS/CMGP netCDF files. The metadata included are a combination of attributes defined in the EPIC conventions, with additional descriptors CMGP investigators find useful. EPIC attributes are CAPITALIZED; the ones added by CMGP are not, or may be Of_Mixed_Case.
Table 3 shows the possible values of EPIC global attributes named INST_TYPE, DATA_TYPE, and DATA_SUB_TYPE that describe the sensor. These terms may be used by other software to determine how the data are treated, so consistency in terms is needed. Column 1 is the generic instrument name we use; columns 2 to 4 are the terms required by EPIC for the attribute names in the first row of each column. Other options may exist for some attributes; for instance, DATA_TYPE may be PROFILE for a CTD lowered from a ship, but because our CTD measurements are made at a single depth, by EPIC's rules, the DATA_TYPE must be TIME.
Table 3: Equatorial Pacific Information Collection (EPIC) attributes that depend on instrument type.
Generic name |
INST_TYPE |
DATA_TYPE |
DATA_SUB_TYPE |
ADCP |
RD Inst. ADCP |
ADCP |
MOORED |
ADCP |
Nortek Aquadopp |
ADCP |
MOORED |
ADCP |
RD Inst. ADCP |
ADCP |
MOORED |
waves |
RD Inst. ADCP |
WAVESPEC |
N/A |
CT |
SeaBird SeaCAT |
TIME* |
N/A |
CT |
SeaBird MicroCAT' |
TIME* |
N/A |
CT |
BR-6999 |
TIME* |
N/A |
ADV |
Sontek ADV |
TIME* |
N/A |
PCADP |
Sontek PCADP |
ADCP |
MOORED |
ABSS |
Aquatec Aquascat ABS |
ABS |
N/A |
* for DATA_TYPE = TIME, no DATA_SUB_TYPE is required
The CMGP also includes many instrument-specific identification and configuration details that may help users reconstruct how the data were collected and processed. For instance ADCP data files typically have the following attributes (among others) that are added.
- transform : earth
- orientation : up
- frequency : 300
- pings_per_ensemble : 60
EPIC conventions also specify the attributes shown in table 4 that are present in all data files. These specify who did the work, why, how often, and other details of what is expected in the data.
Table 4: Equatorial Pacific Information Collection (EPIC) attribute names found in all data types.
Attribute |
Description |
Example value |
PROJECT |
Long name of Research Proj (funding) |
'USGS Coastal Marine Geology Program' |
EXPERIMENT |
Identifier chosen for experiment |
'BOSTON' |
DESCRIPTION |
specific site identifier |
'B BUOY' |
MOORING |
numeric id of the mooring/instrument |
7671 (use 4 digits) |
DELTA_T |
sample interval |
600 (always seconds) |
WATER_DEPTH |
best version of water depth at site |
60 (always meters) |
VAR_FILL |
indicator of bad or missing data |
1.0e35 |
VAR_DESC |
short list of variables in the file |
'u:v:w:Werr:AGC:PGd:Tx:P' |
DATA_CMNT |
provides additional information |
'NO Pressure logged' |
COMPOSITE |
number of pieces in a composite series |
0 if not composite |
FILL_FLAG |
were fill values inserted? |
0 if no, 1 if yes |
DRIFTER |
is the platform drifting? |
0 if no, 1 if yes |
POS_CONST |
is the position consistent? |
0 if it doesn't move, 1 if not consistent |
DEPTH_CONST |
does the depth change? |
0 if consistent, 1 if not consistent |
DATA_ORIGIN |
organization collecting the data |
'USGS WHSC Sed. Trans. Group' |
COORD_SYSTEM |
how are coordinates mapped? |
'GEOGRAPHICAL' |
CREATION_DATE |
USGS WHSC usage is that this is the
last MODIFIED date, not the initial
creation date |
'31-Jan-2005 13:24:00' |
WATER_MASS |
description of water sampled |
normally unused |
The attributes listed in table 5 are not required by EPIC but have been included in the more recently processed files to more accurately document the deployment details and processing steps. The Conventions attribute tells other programs what vocabulary was used in attribute and variable naming. It is similar to indicating "this page is written in Danish"-- it helps software interpret the information correctly.
Table 5: Additional attributes typically employed.
Attribute |
Description |
Deployment_date |
date deployed |
Recovery_date |
date recovered |
latitude |
deployment latitude |
longitude |
deployment longitude |
magnetic_variation |
from NOAA web site for position and time |
start_time |
time of first record in file |
stop_time |
time of last record in file |
SciPi |
scientist responsible for the data |
history * |
all processing steps appended;
most recent thing done is first in list |
Conventions |
PMEL/EPIC |
serial_number |
Instrument or sensor serial number |
inst_height |
Instrument or sensor HAB (m) |
inst_depth |
Instrument or sensor depth (m) |
inst_height |
note about accuracy |
inst_depth_note |
note about accuracy |
* The history attribute is the best place to look for experiment of sensor-specific actions that may have occurred during processing. If data were truncated, it will be indicated here. Other actions, including which programs were run, and the processing sequence are listed in the history attribute.
Variable Attributes
Each variable in the file has its own attributes to describe and quantify the contents. The descriptors in the left column of table 6 are found in most variables; the column on the right contains sample values. If the parameter has more than one dimension, the minimum and maximum may be vector quantities instead of scalars. The minimum and maximum of the data are of the same units as the data-- in the example in table 6, because the transducer temperature is 'degrees C', the maximum and minimum are as well. The sensor depth (water depth minus sensor height) is always meters. FillValue_ is the number that represents erroneous or missing data in a time series. Sometimes the software may display FillValue_ as 1.00000004091848e+035, but it is truly 1.0e+35. The valid_range attribute specifies the potential range of acceptable data.
Table 6: Attributes associated with each variable.
Attribute |
Example value |
name |
'Tx' |
long_name |
'ADCP Transducer Temp.' |
generic_name |
'temp' |
units |
'degrees C' |
epic_code |
1211 |
sensor_type |
'RD Instruments ADCP' |
sensor_depth |
30.558629989624 |
serial_number |
138 |
minimum |
4.65000009536743 |
maximum |
12.0299997329712 |
valid_range |
[-5 40] |
FillValue_ |
1.00000004091848e+035 |
Equatorial Pacific Information Collection (EPIC) Keywords
The tables in Appendix 4 list the EPIC code numbers and associated variable names that are found in this database. If the column for numeric code is blank, the name is one that didn't exist in EPIC that was needed to describe a type of measurement or computed property. If an * is present in the name, when more than one sensor of that type is present, the * is replaced by a number; that is Sed*_981 becomes Sed1_981 and Sed2_981. |