readme.txt Remote Sensing Coastal Change Simple Data Distribution Service DOI:10.5066/P9M3NYWI NOTE: Any use of trade, product or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government. This readme file provides human-readable documentation to facilitate use and understanding of the Data Service, and to describe in clear language how it is intended to work. TABLE OF CONTENTS A. ABSTRACT B. INTRODUCTION/BACKGROUND/PURPOSE? C. DATA SERVICE STRUCTURE D. DATA TYPES AND FORMATS E. PROCESSES FOR QUALITY REVIEW/QUALITY CONTROL F. INSTRUCTIONS FOR USE A. ABSTRACT This readme describes the structure and implementation of a simple, rapid data distribution service (Data Service); a web service that provides remote access to data using standard data access protocols. The Data Service is intended to provide both machine- and human-readable timely and long-term access to provisional, emergency, and approved photogrammetric imagery, derivatives and ancillary data produced using published standardized workflows. The Remote Sensing Coastal Change (RSCC) Simple Data Distribution Service (hereafter referred to as the Data Service) is organized in a folder/file structure with FGDC-compliant metadata describing the immediate underlying data structure. At the root of the Data Service (https://doi.org/10.5066/P9M3NYWI) is service-level metadata and documentation (this file) describing the data organization, with folders for each data collection platform (DCP) or logical group of platforms hosted by the service. Within each DCP folder is a metadata file describing the DCP, its contents, and the naming conventions of subfolders. DCP subfolders are generally organized by collection effort, camera (for multi-camera DCPs), and/or date. Collection-level folders contain metadata for the collection and product-specific subfolders if appropriate, such as digital images intended for photogrammetric workflows, and derivatives and ancillary data including but not limited to statistical products from static image sequences; processed camera position and pose; geospatial raster and vector data; point clouds; segmented images; and image labels in scalar, vector, and raster formats. B. INTRODUCTION/BACKGROUND/PURPOSE The USGS Coastal and Marine Hazards and Resources Program (CMHRP) Remote Sensing Coastal Change (RSCC) project collects and produces imagery and derivatives (time sequences, photogrammetry products, segmented and labeled images) on the scale of big data – hundreds of thousands of images, and surveys every few weeks, sometimes every few days, using a standardized workflow. This document describes a Data Service designed to facilitate release of these voluminous data in a manner that supports timely best science while maintaining USGS Fundamental Science Practices (FSP) and meeting the high-quality data standards of the USGS. By prescribing standard procedures for data structure, data review, quality control, and metadata required for release of data products, this service treats photogrammetric data products from static and mobile platforms similarly to products of other voluminous regular or automated data collection, such as satellite imagery and stream gauge data. The Data Service includes photogrammetric imagery (digital imagery used for photogrammetry) and related data products and derivatives, including but not limited to post-processed Global Navigation Satellite System (GNSS)-derived camera positions, ground control points, camera position and pose, time-integrated pixel products, depth maps, point clouds, digital elevation models, orthophotos, segmented images, and image labels. The Data Service provides a digital data repository for manual, semi- automated, and fully automated retrieval of published data (for example "scraped" by other databases such as ScienceBase, Science Data Catalog, and Imagery Data System). C. DATA SERVICE STRUCTURE AND FORMAT The Data Service provides remote access to data collected and processed by the USGS CMHRP RSCC Project. Data are retrieved using standard HTTP data access protocols, with data and metadata organized into a well-described, machine- and human-readable structure, in a granular format that facilitates retrieving only the data needed by a user, through automated or manual methods. The Data Service is structured in a folder (directory) hierarchy with the root folder of the service (https://doi.org/10.5066/P9M3NYWI) containing a metadata file describing the service, additional documentation in this readme file, and subfolders corresponding to individual data collection platforms (DCPs) or groups of DCPs (such as groups of fixed cameras). Each subfolder in the Data Service has a descriptive name for the DCP or group of DCPs. Each DCP folder contains at minimum a metadata record with the same prefix describing DCP attributes and subfolder content and organization. DCP subfolders are organized and named by collection effort, region, camera and/or date(s), with associated metadata describing the collection effort, such as number of images, spatial extent, start and end times, and associated field activity identifiers. Collection effort subfolders and files are named by product, date, and/or time, and contain individual and/or collection-level metadata where appropriate. Metadata for each DCP provide specific details about data organization for that platform. D. DATA TYPES AND FORMATS The Data Service provides static links to collections of digital data products and individual data products in standard machine-readable formats. Data types include: camera images and image products; camera and ground control point positions; photogrammetry products; machine learning, image labeling, and segmentation products; and ancillary data. Data formats include: raster, tabular, point cloud, vector, hierarchical, and descriptive data. Individual record and collection-level metadata are included where appropriate, and further characterize data types and naming conventions. E. PROCESSES FOR QUALITY REVIEW/QUALITY CONTROL Before a new data type or DCP is added to the Data Service, metadata and data peer reviews of sample data generated with standardized methods (which are described in the metadata) ensure that data products comply with FSP. Subsequent products of the same type are then published with the same methods and metadata template, updating only unique attributes of the data product such as date(s), location/extent, and other variables. Similarly, fundamental similarities in workflows allow for rapid expansion of the Data Service to new DCPs as they are developed, using existing DCPs and products as templates. Emergency data products may be released without metadata, and provisional products may be released subject to changes, or changes in metadata, prior to release of approved data. All provisional and approved data will be released with appropriate disclaimers. *Camera images and image products* Camera images and image products are downloaded from DCPs and converted from camera raw format to a standard format if necessary. Emergency data may be released without further processing. Provisional and approved data EXIF headers are populated with location data, annotated with additional metadata documenting data provenance, associated field activities, standard data disclaimers and restrictions for use, science center contact information, and other values consistent with USGS FSP guidance. Provisional and approved imagery are quality-checked by validating position and other required metadata and reviewing images to remove corrupted imagery, test images, and other images not suitable for publication, if applicable. Images are then published to the Data Service. *Camera and ground control point positions* Tabular data containing image attributes and image and ground-control- point positions are generated with standardized methodology and published as delimited text files with metadata documenting methods in process steps and reference sources. *Photogrammetry products* Photogrammetry products (including but not limited to camera position and orientation, point clouds, raster image and elevation data, camera models, error estimates, and associated data) are generated with standardized methodology and published with metadata documenting methods in process steps and reference sources. *Machine learning, image labeling, and segmentation products* Machine learning, image labeling, and segmentation products are generated with standardized methodology and published with metadata documenting methods in process steps and reference sources. *Ancillary products* Ancillary products (including but not limited to lens calibration models, configuration files, control point positions on images, error reports, QA/QC and index files) are generated with metadata methods in process steps and reference sources where appropriate. F. INSTRUCTIONS FOR USE 1. Manual navigation and download. Users may browse the directory structure from the root of the Data Service to view a DCP directory. DCP directories contain a metadata file describing the DCP, and folders for one or more data collections, with contents and organization specified in the DCP metadata. 2. Automated retrieval Folder names for each DCP are standardized to facilitate automated retrieval. Metadata for each DCP contain details on folder naming conventions for imagery and other products when included. Where appropriate, lists of files and attributes are included to facilitate automated downloading of a subset of images based on location, time, or other properties. Automated tools such as batch/powershell/python scripts, wget, uGet, curl, and browser mass-downloaders can be used to bulk download files from one or more collection dates, including or excluding collection types, recursively or non-recursively. A subset of images can be identified spatially when image coordinates are available in EXIF data or other files by first processing image position files to extract a subset of filenames/URLs and saving them to a list. An example wget command to retrieve all files from a directory and save them to a user-specified location is shown below. wget -nd -np -m -l 1 -R "ht*" -A "jpg" -P Command flags are as follows: -nd flattens the source directory tree so that it's not expanded to mimic the source site -np don't ascend to the parent directory -m mirrors the content in the source webpage (in this case the images) -l limits recursion depth -P sets the save directory location -R sets rejected file extensions (-R "ht*" suppresses saving autogenerated index.html files) -A sets allowed file extensions An example wget command to retrieve files from a list of urls and save them to a user-specified location is: wget -P -i