Coast Train--Labeled imagery for training and evaluation of data-driven models for image segmentation

By Phillipe A. Wernette, Daniel D. Buscombe, Jaycee Favela, Sharon Fitzpatrick, Evan Goldstein, Nicholas M. Enwright, and Erin Dunand

https://doi.org/10.5066/P91NP87I


Dates

Published: March 19, 2022

Summary

Coast Train is a library of images of coastal environments, annotations, and corresponding thematic label masks (or ‘label images’) collated for the purposes of training and evaluating machine learning (ML), deep learning, and other models for image segmentation. It includes image sets from both geospatial satellite, aerial, and UAV imagery and orthomosaics, as well as non-geospatial oblique and nadir imagery. Images include a diverse range of coastal environments from the U.S. Pacific, Gulf of Mexico, Atlantic, and Great Lakes coastlines, consisting of time-series of high-resolution (≤1m) orthomosaics and satellite image tiles (10–30m). Each image, image annotation, and labelled image is available as a single zipped file. Each zipped file contains a folder of NPZ format files, and a csv file containing metadata for each labeled image. Zipped folders files follow the following naming convention: {datasource}_{numberofclasses}_{threedigitdatasetversion}.zip, where {datasource} is the source of the original images (for example, NAIP, Landsat 8, Sentinel 2), {numberofclasses} is the number of classes used to annotate the images, and {threedigitdatasetversion} is the three-digit code corresponding to the dataset version (in other words, 001 is version 1). Each zipped folder contains a collection of NPZ format files, each of which corresponds to an individual image, and a CSV file with metadata information for every image. An individual NPZ file is named after the image that it represents and contains a collection of the following variables: orig_image (original input image unedited), image (original input image after color balancing and normalization), classes (list of classes annotated and present in the labelled image), doodles (integer image of all image annotations), color_doodles (color image of doodles), label (labelled image created from the classes present in the annotations), and settings (annotation and machine learning settings used to generate the labelled image from annotations). All NPZ files can be extracted using the utilities available in Doodler (Buscombe, 2022; https://doi.org/10.5066/P9YVHL23), a process documented on the Coast Train website (https://dbuscombe-usgs.github.io/CoastTrain/docs/).

Land Cover Data

  • Coast Train--Labeled imagery for training and evaluation of data-driven models for image segmentation

    Coast Train is a library of images of coastal environments, annotations, and corresponding thematic label masks (or ‘label images’) collated for the purposes of training and evaluating machine learning (ML), deep learning, and other models for image segmentation. It includes image sets from both geospatial satellite, aerial, and UAV imagery and orthomosaics, as well as non-geospatial oblique and nadir imagery. Images include a diverse range of coastal environments from the U.S. Pacific, Gulf of Mexico, Atlantic, and Great Lakes coastlines, consisting of time-series of high-resolution (≤1m) orthomosaics and satellite image tiles (10–30m). Each image, image annotation, and labelled image is available as a single NPZ zipped file. NPZ files follow the following naming convention: {datasource}_{numberofclasses}_{threedigitdatasetversion}.zip, where {datasource} is the source of the original images (for example, NAIP, Landsat 8, Sentinel 2), {numberofclasses} is the number of classes used to annotate the images, and {threedigitdatasetversion} is the three-digit code corresponding to the dataset version (in other words, 001 is version 1). Each zipped folder contains a collection of NPZ format files, each of which corresponds to an individual image. An individual NPZ file is named after the image that it represents and contains (1) a CSV file with detail information for every image in the zip folder and (2) a collection of the following NPY files: orig_image.npy (original input image unedited), image.npy (original input image after color balancing and normalization), classes.npy (list of classes annotated and present in the labelled image), doodles.npy (integer image of all image annotations), color_doodles.npy (color image of doodles.npy), label.npy (labelled image created from the classes present in the annotations), and settings.npy (annotation and machine learning settings used to generate the labelled image from annotations). All NPZ files can be extracted using the utilities available in Doodler (Buscombe, 2022). A merged CSV file containing detail information on the complete imagery collection is available at the top level of this data release, details of which are available in the Entity and Attribute section of this metadata file.

    Data Files

    CoastTrain_imagery_details.csv - 1.2 MB - MD5:91bf2542a81eaef9747f2bea35552108

    Landsat8_11_001.zip - 611.3 MB - MD5:ea393da251ab91bf0166fbb0621bdc88

    Landsat8_12_001.zip - 12.2 MB - MD5:e0bce7461cf1fb3eea5216af3c0a0472

    NAIP_11_001.zip - 4.0 GB - MD5:bab7a921dc9886ec98705cc2b5833f4c

    NAIP_6_001.zip - 141.1 MB - MD5:43da19332bb1cf3b6c0dab37a38c7922

    Orthophoto_12_001.zip - 501.4 MB - MD5:38bb34f126226886cec4d73e87d2a201

    Orthophoto_8_001.zip - 1.3 GB - MD5:953e93fa2b23cd784654d90d262e3d7c

    Orthophoto_9_001.zip - 1.5 GB - MD5:9ebdfbfa33ccfa54e761e2091b42e9d4

    Quadrangles_7_001.zip - 326.4 MB - MD5:2f895398a53013defc0ecc50aed268c7

    Sentinel2_11_001.zip - 361.6 MB - MD5:e86856c9c5603cefb79b8df29990e2d2

    Sentinel2_4_001.zip - 100.8 MB - MD5:3d32b9bd7671153676bd1ec53607a56f

    Metadata Files

    CoastTrain_imagery_details_metadata.xml - 41.7 KB

    CoastTrain_imagery_details_metadata.txt - 41.7 KB

    Preview of dataset
    Split graphic showing the original image overlaid with the segmented data.

Suggested Citation

Wernette, P.A., Buscombe, D.D., Favela, J., Fitzpatrick, S., and Goldstein E., 2022, Coast Train--Labeled imagery for training and evaluation of data-driven models for image segmentation: U.S. Geological Survey data release, https://doi.org/10.5066/P91NP87I.

Overview Image

Overview image of data release
Split graphic showing the original image overlaid with the segmented data.