readme.txt

Remote Sensing Coastal Change Simple Data Distribution Service  
DOI:10.5066/P9M3NYWI
NOTE: Any use of trade, product or firm names is for descriptive purposes 
only and does not imply endorsement by the U.S. Government.

This readme file provides human-readable documentation to facilitate use 
and understanding of the Data Service, and to describe in clear language 
how it is intended to work.

TABLE OF CONTENTS

A. ABSTRACT
B. INTRODUCTION/BACKGROUND/PURPOSE?
C. DATA SERVICE STRUCTURE
D. DATA TYPES AND FORMATS
E. PROCESSES FOR QUALITY REVIEW/QUALITY CONTROL
F. INSTRUCTIONS FOR USE


A. ABSTRACT

This readme describes the structure and implementation of a simple, rapid 
data distribution service (Data Service); a web service that provides remote 
access to data using standard data access protocols. The Data Service is 
intended to provide both machine- and human-readable timely and long-term 
access to provisional, emergency, and approved photogrammetric imagery, 
derivatives and ancillary data produced using published standardized 
workflows. 

The Remote Sensing Coastal Change (RSCC) Simple Data Distribution Service 
(hereafter referred to as the Data Service) is organized in a folder/file 
structure with FGDC-compliant metadata describing the immediate underlying 
data structure. At the root of the Data Service (https://doi.org/10.5066/P9M3NYWI) 
is service-level metadata and documentation (this file) describing the data 
organization, with folders for each data collection platform (DCP) or logical 
group of platforms hosted by the service. Within each DCP folder is a 
metadata file describing the DCP, its contents, and the naming conventions of 
subfolders. DCP subfolders are generally organized by collection effort, 
camera (for multi-camera DCPs), and/or date. Collection-level folders contain 
metadata for the collection and product-specific subfolders if appropriate, 
such as digital images intended for photogrammetric workflows, and 
derivatives and ancillary data including but not limited to statistical 
products from static image sequences; processed camera position and pose; 
geospatial raster and vector data; point clouds; segmented images; and image 
labels in scalar, vector, and raster formats.

B. INTRODUCTION/BACKGROUND/PURPOSE

The USGS Coastal and Marine Hazards and Resources Program (CMHRP) Remote 
Sensing Coastal Change (RSCC) project collects and produces imagery and 
derivatives (time sequences, photogrammetry products, segmented and labeled 
images) on the scale of big data – hundreds of thousands of images, and 
surveys every few weeks, sometimes every few days, using a standardized 
workflow. This document describes a Data Service designed to facilitate 
release of these voluminous data in a manner that supports timely best 
science while maintaining USGS Fundamental Science Practices (FSP) and 
meeting the high-quality data standards of the USGS. By prescribing standard 
procedures for data structure, data review, quality control, and metadata 
required for release of data products, this service treats photogrammetric 
data products from static and mobile platforms similarly to products of other 
voluminous regular or automated data collection, such as satellite imagery 
and stream gauge data.

The Data Service includes photogrammetric imagery (digital imagery used 
for photogrammetry) and related data products and derivatives, including 
but not limited to post-processed Global Navigation Satellite System 
(GNSS)-derived camera positions, ground control points, camera position 
and pose, time-integrated pixel products, depth maps, point clouds, 
digital elevation models, orthophotos, segmented images, and image 
labels. 

The Data Service provides a digital data repository for manual, semi-
automated, and fully automated retrieval of published data (for example 
"scraped" by other databases such as ScienceBase, Science Data Catalog, 
and Imagery Data System).

C. DATA SERVICE STRUCTURE AND FORMAT

The Data Service provides remote access to data collected and processed by 
the USGS CMHRP   RSCC Project. Data are retrieved using standard HTTP data 
access protocols, with data and metadata organized into a well-described, 
machine- and human-readable structure, in a granular format that 
facilitates retrieving only the data needed by a user, through automated 
or manual methods. 

The Data Service is structured in a folder (directory) hierarchy with the 
root folder of the service (https://doi.org/10.5066/P9M3NYWI) 
containing a metadata file describing the service, additional 
documentation in this readme file, and subfolders corresponding to 
individual data collection platforms (DCPs) or groups of DCPs (such as 
groups of fixed cameras). 

Each subfolder in the Data Service has a descriptive name for the DCP or 
group of DCPs. Each DCP folder contains at minimum a metadata record with 
the same prefix describing DCP attributes and subfolder content and 
organization.

DCP subfolders are organized and named by collection effort, region, 
camera and/or date(s), with associated metadata describing the collection
effort, such as number of images, spatial extent, start and end times, 
and associated field activity identifiers.

Collection effort subfolders and files are named by product, date, and/or 
time, and contain individual and/or collection-level metadata where 
appropriate. Metadata for each DCP provide specific details about data 
organization for that platform.


D. DATA TYPES AND FORMATS

The Data Service provides static links to collections of digital data 
products and individual data products in standard machine-readable 
formats. 

Data types include: camera images and image products; camera and ground 
control point positions; photogrammetry products; machine learning, image 
labeling, and segmentation products; and ancillary data.

Data formats include: raster, tabular, point cloud, vector, hierarchical, 
and descriptive data.

Individual record and collection-level metadata are included where 
appropriate, and further characterize data types and naming conventions.


E. PROCESSES FOR QUALITY REVIEW/QUALITY CONTROL

Before a new data type or DCP is added to the Data Service, metadata and 
data peer reviews of sample data generated with standardized methods 
(which are described in the metadata) ensure that data products comply 
with FSP. Subsequent products of the same type are then published with 
the same methods and metadata template, updating only unique attributes 
of the data product such as date(s), location/extent, and other 
variables. Similarly, fundamental similarities in workflows allow for 
rapid expansion of the Data Service to new DCPs as they are developed, 
using existing DCPs and products as templates. Emergency data products 
may be released without metadata, and provisional products may be 
released subject to changes, or changes in metadata, prior to release of 
approved data. All provisional and approved data will be released with 
appropriate disclaimers.

*Camera images and image products*
Camera images and image products are downloaded from DCPs and converted   from 
camera raw format to a standard format if necessary. Emergency data may be 
released without further processing. Provisional and approved data EXIF 
headers are populated with location data, annotated with additional metadata 
documenting data provenance, associated field activities, standard data 
disclaimers and restrictions for use, science center contact information, and 
other values consistent with USGS FSP guidance. Provisional and approved 
imagery are quality-checked by validating position and other required 
metadata and reviewing images to remove corrupted imagery, test images, and 
other images not suitable for publication, if applicable. Images are then 
published to the Data Service.

*Camera and ground control point positions*
Tabular data containing image attributes and image and ground-control-
point positions are generated with standardized methodology and published 
as delimited text files with metadata documenting methods in process 
steps and reference sources.

*Photogrammetry products*
Photogrammetry products (including but not limited to camera position and 
orientation, point clouds, raster image and elevation data, camera models, 
error estimates, and associated data) are generated with standardized 
methodology and published with metadata documenting methods in process 
steps and reference sources.

*Machine learning, image labeling, and segmentation products* 
Machine learning, image labeling, and segmentation products are generated 
with standardized methodology and published with metadata documenting 
methods in process steps and reference sources.

*Ancillary products*
Ancillary products (including but not limited to lens calibration models, 
configuration files, control point positions on images, error reports, 
QA/QC and index files) are generated with metadata methods in process 
steps and reference sources where appropriate.


F. INSTRUCTIONS FOR USE

1. Manual navigation and download.

Users may browse the directory structure from the root of the Data 
Service to view a DCP directory. DCP directories contain a metadata file 
describing the DCP, and folders for one or more data collections, with 
contents and organization specified in the DCP metadata.

2. Automated retrieval

Folder names for each DCP are standardized to facilitate automated 
retrieval. Metadata for each DCP contain details on folder naming 
conventions for imagery and other products when included. Where 
appropriate, lists of files and attributes are included to facilitate 
automated downloading of a subset of images based on location, time, or 
other properties. 

Automated tools such as batch/powershell/python scripts, wget, uGet, 
curl, and browser mass-downloaders can be used to bulk download files 
from one or more collection dates, including or excluding collection 
types, recursively or non-recursively. A subset of images can be 
identified spatially when image coordinates are available in EXIF data or 
other files by first processing image position files to extract a subset 
of filenames/URLs and saving them to a list.

An example wget command to retrieve all files from a directory and save 
them to a user-specified location is shown below.

wget -nd -np -m -l 1 -R "ht*" -A "jpg" -P <save location> 
<http://dataservice.gov/DCP/collection/imagery_directory>

Command flags are as follows:

	-nd	flattens the source directory tree so that it's not expanded 
		to mimic the source site
	-np	don't ascend to the parent directory
	-m	mirrors the content in the source webpage (in this case the 
		images)
	-l	limits recursion depth
	-P	sets the save directory location
	-R	sets rejected file extensions (-R "ht*" suppresses saving 
		autogenerated index.html files)
	-A	sets allowed file extensions

An example wget command to retrieve files from a list of urls and save 
them to a user-specified location is:

wget -P <save location> -i <file_url_list.txt>