Coast Train--Labeled imagery for training and evaluation of data-driven models for image segmentation

Metadata also available as - [Outline] - [Parseable text] - [XML]

Frequently anticipated questions:

What does this data set describe?
Who produced the data set?
Why was the data set created?
How was the data set created?
How reliable are the data; what problems remain in the data set?
How can someone get a copy of the data set?
Who wrote the metadata?

What does this data set describe?

Title:

Coast Train--Labeled imagery for training and evaluation of data-driven models for image segmentation

Abstract:

Coast Train is a library of images of coastal environments, annotations, and corresponding thematic label masks (or ‘label images’) collated for the purposes of training and evaluating machine learning (ML), deep learning, and other models for image segmentation. It includes image sets from both geospatial satellite, aerial, and UAV imagery and orthomosaics, as well as non-geospatial oblique and nadir imagery. Images include a diverse range of coastal environments from the U.S. Pacific, Gulf of Mexico, Atlantic, and Great Lakes coastlines, consisting of time-series of high-resolution (≤1m) orthomosaics and satellite image tiles (10–30m). Each image, image annotation, and labelled image is available as a single NPZ zipped file. NPZ files follow the following naming convention: {datasource}_{numberofclasses}_{threedigitdatasetversion}.zip, where {datasource} is the source of the original images (for example, NAIP, Landsat 8, Sentinel 2), {numberofclasses} is the number of classes used to annotate the images, and {threedigitdatasetversion} is the three-digit code corresponding to the dataset version (in other words, 001 is version 1). Each zipped folder contains a collection of NPZ format files, each of which corresponds to an individual image. An individual NPZ file is named after the image that it represents and contains (1) a CSV file with detail information for every image in the zip folder and (2) a collection of the following NPY files: orig_image.npy (original input image unedited), image.npy (original input image after color balancing and normalization), classes.npy (list of classes annotated and present in the labelled image), doodles.npy (integer image of all image annotations), color_doodles.npy (color image of doodles.npy), label.npy (labelled image created from the classes present in the annotations), and settings.npy (annotation and machine learning settings used to generate the labelled image from annotations). All NPZ files can be extracted using the utilities available in Doodler (Buscombe, 2022). A merged CSV file containing detail information on the complete imagery collection is available at the top level of this data release, details of which are available in the Entity and Attribute section of this metadata file.

Supplemental_Information:

Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

How might this data set be cited?
Wernette, Phillipe A., Buscombe, Daniel D., Fitzpatrick, Sharon, Favela, Jaycee, Enwright, Nicholas, Goldstein, Evan, and Dunand, Erin, 20220319, Coast Train--Labeled imagery for training and evaluation of data-driven models for image segmentation: data release DOI:10.5066/P91NP87I, U.S. Geological Survey, Pacific Coastal and Marine Science Center, Santa Cruz, California.
Online Links:
- https://doi.org/10.5066/P91NP87I
What geographic area does the data set cover?
West_Bounding_Coordinate: -180.0
East_Bounding_Coordinate: 180.0
North_Bounding_Coordinate: 90.0
South_Bounding_Coordinate: -90.0
What does it look like?

coast_train_thumbnail.png (PNG)
Split graphic with the original image on the left and the segmented right half of the image on the right.
Does the data set describe conditions during a particular time period?

Beginning_Date: 01-Jan-2008

Ending_Date: 31-Dec-2020

Currentness_Reference:

date range of imagery in library
What is the general form of this data set?
Geospatial_Data_Presentation_Form:
list of images and details in csv format; imagery in NumPy binary file format
How does the data set represent geographic features?
1. How are geographic features stored in the data set?
  
  Indirect_Spatial_Reference:
  Original images were downloaded at sites along the conterminous U.S. coastline, including sites along the U.S. Atlantic, Gulf of Mexico, Pacific, and Great Lakes coasts. Sites were selected to provide a representative sample of an array of coastal types (for example, sandy, cliff, marsh, wetland, developed). Refer to the self-contained NPZ files for more information on locations of original images.
2. What coordinate system is used to represent geographic features?

How does the data set describe geographic features?

CoastTrain_imagery_details.csv

Table containing detailed information about the imagery in this dataset. (Source: Producer defined)

name

Name of image source (Source: Producer defined) Unique identifier of image name.

publisher

Original publisher of the image source (Source: Producer defined) Unique identifier of image publisher.

labels

The image label file. One-hot-encoded label image (2D raster) in 8-bit unsigned integer. Each integer encodes a class label, incrementing through 'classes' starting at zero. (Source: Producer defined) Unique identifier of label image file.

images

The original image file used in classification. (Source: Producer defined) Unique identifier of image filename.

annotation_image_filename

Image filename with annotations. (Source: Producer defined) Unique identifier of annotation image filename.

classes_array

An array of classification classes in the image. (Source: Producer defined)

Value	Definition
water	Classified as water
whitewater	Classified as whitewater
surf	Classified as surf
mud_silt	Classified as mud or fine sediment
sand	Classified as bare sand
gravel	Classified as gravel
gravel_shell	Classified as mixture of gravel and shells
cobble_gravel	Classified as gravel with cobbles
bedrock	Classified as exposed bedrock
ice_snow	Classified as ice or snow
bare_ground	Classified as bare ground
sediment	Classified as bare sediment
other_natural_terrain	Classified as other natural terrain
other_bare_natural_terrain	Classified as other bare natural terrain
vegetated	Classified as vegetation
vegetated_ground	Classified as ground with vegetation
vegetated_surface	Classified as vegetation
marsh_vegetation	Classified as marsh vegetation
terrestrial_vegetation	Classified as terrestrial vegetation
agricultural	Classified as agricultural
cloud	Classified as cloud
development	Classified as consisting of human development
dev	Classified as consisting of human development
coastal_defense	Classified as coastal defense
buildings	Classified as building
pavement_road	Classified as pavement
vehicles	Classified as vehicles
people	Classified as person
other_anthro	Classified as other anthropogenic object
unusual	Classified as unusal object or land cover
unknown	Classified as unknown land cover
no_data	No data contained in pixels
nodata	No data contained in pixels

num_classes

Number of classification classes in the image. (Source: Producer defined)

Range of values
Minimum:	1
Maximum:	12

classes_integer

One integer per class in num_classes. (Source: Producer defined)

Range of values
Minimum:	0
Maximum:	12

classes_present_integer

An array of integer classes present in the image. (Source: Producer defined)

Range of values
Minimum:	0
Maximum:	12

classes_present_array

An array of classes present in the image. (Source: Producer defined) Values present in image from classes

pen_width

Final width in pixels of pen used to annotate in the Doodler program. (Source: Producer defined)

Range of values
Minimum:	1
Maximum:	10

CRF_theta