Coast Train--Labeled imagery for training and evaluation of data-driven models for image segmentation
By Phillipe A. Wernette, Daniel D. Buscombe, Jaycee Favela, Sharon Fitzpatrick, Evan Goldstein, Nicholas M. Enwright, and Erin Dunand
https://doi.org/10.5066/P91NP87I
Dates
Published: March 19, 2022
Summary
Coast Train is a library of images of coastal environments, annotations, and corresponding thematic label masks (or ‘label images’) collated for the purposes of training and evaluating machine learning (ML), deep learning, and other models for image segmentation. It includes image sets from both geospatial satellite, aerial, and UAV imagery and orthomosaics, as well as non-geospatial oblique and nadir imagery. Images include a diverse range of coastal environments from the U.S. Pacific, Gulf of Mexico, Atlantic, and Great Lakes coastlines, consisting of time-series of high-resolution (≤1m) orthomosaics and satellite image tiles (10–30m). Each image, image annotation, and labelled image is available as a single zipped file. Each zipped file contains a folder of NPZ format files, and a csv file containing metadata for each labeled image. Zipped folders files follow the following naming convention: {datasource}_{numberofclasses}_{threedigitdatasetversion}.zip, where {datasource} is the source of the original images (for example, NAIP, Landsat 8, Sentinel 2), {numberofclasses} is the number of classes used to annotate the images, and {threedigitdatasetversion} is the three-digit code corresponding to the dataset version (in other words, 001 is version 1). Each zipped folder contains a collection of NPZ format files, each of which corresponds to an individual image, and a CSV file with metadata information for every image. An individual NPZ file is named after the image that it represents and contains a collection of the following variables: orig_image (original input image unedited), image (original input image after color balancing and normalization), classes (list of classes annotated and present in the labelled image), doodles (integer image of all image annotations), color_doodles (color image of doodles), label (labelled image created from the classes present in the annotations), and settings (annotation and machine learning settings used to generate the labelled image from annotations). All NPZ files can be extracted using the utilities available in Doodler (Buscombe, 2022; https://doi.org/10.5066/P9YVHL23), a process documented on the Coast Train website (https://dbuscombe-usgs.github.io/CoastTrain/docs/).
Land Cover Data
-
Coast Train--Labeled imagery for training and evaluation of data-driven models for image segmentation
Coast Train is a library of images of coastal environments, annotations, and corresponding thematic label masks (or ‘label images’) collated for the purposes of training and evaluating machine learning (ML), deep learning, and other models for image segmentation. It includes image sets from both geospatial satellite, aerial, and UAV imagery and orthomosaics, as well as non-geospatial oblique and nadir imagery. Images include a diverse range of coastal environments from the U.S. Pacific, Gulf of Mexico, Atlantic, and Great Lakes coastlines, consisting of time-series of high-resolution (≤1m) orthomosaics and satellite image tiles (10–30m). Each image, image annotation, and labelled image is available as a single NPZ zipped file. NPZ files follow the following naming convention: {datasource}_{numberofclasses}_{threedigitdatasetversion}.zip, where {datasource} is the source of the original images (for example, NAIP, Landsat 8, Sentinel 2), {numberofclasses} is the number of classes used to annotate the images, and {threedigitdatasetversion} is the three-digit code corresponding to the dataset version (in other words, 001 is version 1). Each zipped folder contains a collection of NPZ format files, each of which corresponds to an individual image. An individual NPZ file is named after the image that it represents and contains (1) a CSV file with detail information for every image in the zip folder and (2) a collection of the following NPY files: orig_image.npy (original input image unedited), image.npy (original input image after color balancing and normalization), classes.npy (list of classes annotated and present in the labelled image), doodles.npy (integer image of all image annotations), color_doodles.npy (color image of doodles.npy), label.npy (labelled image created from the classes present in the annotations), and settings.npy (annotation and machine learning settings used to generate the labelled image from annotations). All NPZ files can be extracted using the utilities available in Doodler (Buscombe, 2022). A merged CSV file containing detail information on the complete imagery collection is available at the top level of this data release, details of which are available in the Entity and Attribute section of this metadata file.
Data Files
CoastTrain_imagery_details.csv - 1.2 MB - MD5:91bf2542a81eaef9747f2bea35552108
Landsat8_11_001.zip - 611.3 MB - MD5:ea393da251ab91bf0166fbb0621bdc88
Landsat8_12_001.zip - 12.2 MB - MD5:e0bce7461cf1fb3eea5216af3c0a0472
NAIP_11_001.zip - 4.0 GB - MD5:bab7a921dc9886ec98705cc2b5833f4c
NAIP_6_001.zip - 141.1 MB - MD5:43da19332bb1cf3b6c0dab37a38c7922
Orthophoto_12_001.zip - 501.4 MB - MD5:38bb34f126226886cec4d73e87d2a201
Orthophoto_8_001.zip - 1.3 GB - MD5:953e93fa2b23cd784654d90d262e3d7c
Orthophoto_9_001.zip - 1.5 GB - MD5:9ebdfbfa33ccfa54e761e2091b42e9d4
Quadrangles_7_001.zip - 326.4 MB - MD5:2f895398a53013defc0ecc50aed268c7
Sentinel2_11_001.zip - 361.6 MB - MD5:e86856c9c5603cefb79b8df29990e2d2
Sentinel2_4_001.zip - 100.8 MB - MD5:3d32b9bd7671153676bd1ec53607a56f
Metadata Files
CoastTrain_imagery_details_metadata.xml - 41.7 KB
CoastTrain_imagery_details_metadata.txt - 41.7 KB
Suggested Citation
Wernette, P.A., Buscombe, D.D., Favela, J., Fitzpatrick, S., and Goldstein E., 2022, Coast Train--Labeled imagery for training and evaluation of data-driven models for image segmentation: U.S. Geological Survey data release, https://doi.org/10.5066/P91NP87I.