Phillipe A. Wernette
Daniel D. Buscombe
Sharon Fitzpatrick
Jaycee Favela
Nicholas Enwright
Evan Goldstein
Erin Dunand
20220319
Coast Train--Labeled imagery for training and evaluation of data-driven models for image segmentation
list of images and details in csv format; imagery in NumPy binary file format
data release
DOI:10.5066/P91NP87I
Pacific Coastal and Marine Science Center, Santa Cruz, California
U.S. Geological Survey
https://doi.org/10.5066/P91NP87I
Coast Train is a library of images of coastal environments, annotations, and corresponding thematic label masks (or ‘label images’) collated for the purposes of training and evaluating machine learning (ML), deep learning, and other models for image segmentation. It includes image sets from both geospatial satellite, aerial, and UAV imagery and orthomosaics, as well as non-geospatial oblique and nadir imagery. Images include a diverse range of coastal environments from the U.S. Pacific, Gulf of Mexico, Atlantic, and Great Lakes coastlines, consisting of time-series of high-resolution (≤1m) orthomosaics and satellite image tiles (10–30m). Each image, image annotation, and labelled image is available as a single NPZ zipped file. NPZ files follow the following naming convention: {datasource}_{numberofclasses}_{threedigitdatasetversion}.zip, where {datasource} is the source of the original images (for example, NAIP, Landsat 8, Sentinel 2), {numberofclasses} is the number of classes used to annotate the images, and {threedigitdatasetversion} is the three-digit code corresponding to the dataset version (in other words, 001 is version 1). Each zipped folder contains a collection of NPZ format files, each of which corresponds to an individual image. An individual NPZ file is named after the image that it represents and contains (1) a CSV file with detail information for every image in the zip folder and (2) a collection of the following NPY files: orig_image.npy (original input image unedited), image.npy (original input image after color balancing and normalization), classes.npy (list of classes annotated and present in the labelled image), doodles.npy (integer image of all image annotations), color_doodles.npy (color image of doodles.npy), label.npy (labelled image created from the classes present in the annotations), and settings.npy (annotation and machine learning settings used to generate the labelled image from annotations). All NPZ files can be extracted using the utilities available in Doodler (Buscombe, 2022). A merged CSV file containing detail information on the complete imagery collection is available at the top level of this data release, details of which are available in the Entity and Attribute section of this metadata file.
Training machine learning (ML) and other models for segmentation will greatly facilitate the creation of land cover maps from geospatial imagery with greater specificity, as well as mapping coastal sediments, transient waterbodies, landforms, and other features of interest, in both geospatial and non-geospatial imagery. Coast Train adheres to the principle of ‘Map Once, Use Many Times’ and is well positioned to transfer learning across a wide range of coastal environments.
Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
20080101
20201231
date range of imagery in library
As needed
-180.0
180.0
90.0
-90.0
USGS Metadata Identifier
USGS:9cdb71c1-cc5a-4786-9232-93d7e7a340cf
ISO 19115 Topic Category
environment
geoscientificinformation
Data Categories for Marine Planning
distributions
Physical Habitats and Geomorphology
habitat
infrastructure
structures
USGS Thesaurus
image analysis
mitigation of coastal hazards
remote sensing
aerial photography
multispectral imaging
visible light imaging
datasets
geospatial datasets
image collections
earth sciences
geography
geology
life sciences
botany
ecology
social sciences
biological and physical processes
hazards
human impacts
land use change
land use and land cover
coastal ecosystems
Marine Realms Information Bank (MRIB) keywords
agents of coastal change
coastal processes
anthropogenic agents of coastal change
coastal development
coastal protection structures
effects of coastal change
human responses to coastal change
earth system
bay
beach
cape
cliff
coast
coastal barrier
coastal plain
cove
dune
island
lagoon
lake
marsh
mudflat
ocean
shore
swamp
tidal flat
tidal inlet
beach zone communities
breakwater/shoreline stabilization structure
bridge
canal
jetty
environment
hazards and disasters
erosion
floods
remote sensing
aerial and satellite photography
biology
ecology
computer science
geography
environmental geography
physical geography
geology
information science
None
U.S. Geological Survey
USGS
Coastal and Marine Hazards and Resources Program
CMHRP
Pacific Coastal and Marine Science Center
PCMSC
St. Petersburg Coastal and Marine Science Center
SPCMSC
Wetland and Aquatic Research Center
WARC
Woods Hole Coastal and Marine Science Center
WHCMSC
Geographic Names Information System (GNIS)
United States
None
USGS-authored or produced data and information are in the public domain from the U.S. Government and are freely redistributable with proper metadata and source attribution. Please recognize and acknowledge the U.S. Geological Survey as the originator(s) of the dataset and in products derived from these data.
U.S. Geological Survey, Pacific Coastal and Marine Science Center
PCMSC Science Data Coordinator
mailing and physical
2885 Mission Street
Santa Cruz
CA
95060
831-427-4747
pcmsc_data@usgs.gov
coast_train_thumbnail.png
Split graphic with the original image on the left and the segmented right half of the image on the right.
PNG
Images were processed using Doodler software (Buscombe, 2022) in Windows operating systems.
Daniel D. Buscombe
2022
Doodler--A web application built with plotly/dash for image segmentation with minimal supervision
software
software release
DOI:10.5066/P9YVHL23
Pacific Coastal and Marine Science Center, Santa Cruz, California
U.S. Geological Survey
Buscombe, D.D., 2022, Doodler--A web application built with plotly/dash for image segmentation with minimal supervision: U.S. Geological Survey software release, https://doi.org/10.5066/P9YVHL23
https://doi.org/10.5066/P9YVHL23
Mean Intersection over Union (IoU) scores for quantifying inter-labeler agreement were computed using 120 images across two datasets, namely NAIP (70 image pairs) and Sentinel-2 (50 image pairs), that have been labeled independently by experienced labelers. Mean IoU is the standard way to report agreement between two realizations of the same label image. Further, because IoU quantifies spatial overlap and is prone to class imbalance, Kullback-Leibler divergence scores were also computed to quantify agreement between class-frequency distributions. When comparing IoU and Dice scores, it is preferable to examine agreement using multiple independent metrics. The mean of mean IoU scores was 0.88, which we recommend using as an expected irreducible error. Previous research suggests that mean IoU scores tend to be inversely correlated with number of classes; therefore, this error is a conservative estimate.
All annotation values are integer based, with each integer corresponding to a unique class. The program used to generate the final classified/labelled images ensured that every pixel in the original image is classified into one of the annotated classes. There is no possibility that the actual values are outside of the reported ranges of values.
Dataset is considered complete for the information presented, as described in the abstract. Users are advised to read the rest of the metadata record carefully for additional details.
A formal accuracy assessment of the horizontal positional information in the dataset has not been conducted. Each of the input data sources has its own horizontal accuracy available in their source metadata.
Image Annotation/Doodling--Each image was opened using Doodler with the class list provided in the metadata sheet. The user would then select a single class from the options on the right and click and hold on the image to begin drawing a line (annotate/doodle) where the selected class exists on the image. Annotations can be quick and as simple as a point or single line or as complex as a meandering or looping back series of lines. This annotation process was repeated one or more times for every class present in the image.
20211231
Image Classification/Segmentation--Once image annotations were complete for all classes present in the image, the program will segment the image and classify every pixel in it by checking the “Compute/Show segmentation” box on the right. If the final image is not accurate to the classes present and their distribution, then the user can uncheck the “Compute/Show segmentation” box and repeat the annotation and classification/segmentation steps until they are satisfied with the final segmented image.
20211231
Edited to correct spelling of author name. No data were changed. (scochran@usgs.gov)
20230504
Original images were downloaded at sites along the conterminous U.S. coastline, including sites along the U.S. Atlantic, Gulf of Mexico, Pacific, and Great Lakes coasts. Sites were selected to provide a representative sample of an array of coastal types (for example, sandy, cliff, marsh, wetland, developed). Refer to the self-contained NPZ files for more information on locations of original images.
CoastTrain_imagery_details.csv
Table containing detailed information about the imagery in this dataset.
Producer defined
name
Name of image source
Producer defined
Unique identifier of image name.
publisher
Original publisher of the image source
Producer defined
Unique identifier of image publisher.
labels
The image label file. One-hot-encoded label image (2D raster) in 8-bit unsigned integer. Each integer encodes a class label, incrementing through 'classes' starting at zero.
Producer defined
Unique identifier of label image file.
images
The original image file used in classification.
Producer defined
Unique identifier of image filename.
annotation_image_filename
Image filename with annotations.
Producer defined
Unique identifier of annotation image filename.
classes_array
An array of classification classes in the image.
Producer defined
water
Classified as water
Producer defined
whitewater
Classified as whitewater
Producer defined
surf
Classified as surf
Producer defined
mud_silt
Classified as mud or fine sediment
Producer defined
sand
Classified as bare sand
Producer defined
gravel
Classified as gravel
Producer defined
gravel_shell
Classified as mixture of gravel and shells
Producer defined
cobble_gravel
Classified as gravel with cobbles
Producer defined
bedrock
Classified as exposed bedrock
Producer defined
ice_snow
Classified as ice or snow
Producer defined
bare_ground
Classified as bare ground
Producer defined
sediment
Classified as bare sediment
Producer defined
other_natural_terrain
Classified as other natural terrain
Producer defined
other_bare_natural_terrain
Classified as other bare natural terrain
Producer defined
vegetated
Classified as vegetation
Producer defined
vegetated_ground
Classified as ground with vegetation
Producer defined
vegetated_surface
Classified as vegetation
Producer defined
marsh_vegetation
Classified as marsh vegetation
Producer defined
terrestrial_vegetation
Classified as terrestrial vegetation
Producer defined
agricultural
Classified as agricultural
Producer defined
cloud
Classified as cloud
Producer defined
development
Classified as consisting of human development
Producer defined
dev
Classified as consisting of human development
Producer defined
coastal_defense
Classified as coastal defense
Producer defined
buildings
Classified as building
Producer defined
pavement_road
Classified as pavement
Producer defined
vehicles
Classified as vehicles
Producer defined
people
Classified as person
Producer defined
other_anthro
Classified as other anthropogenic object
Producer defined
unusual
Classified as unusal object or land cover
Producer defined
unknown
Classified as unknown land cover
Producer defined
no_data
No data contained in pixels
Producer defined
nodata
No data contained in pixels
Producer defined
num_classes
Number of classification classes in the image.
Producer defined
1
12
classes_integer
One integer per class in num_classes.
Producer defined
0
12
classes_present_integer
An array of integer classes present in the image.
Producer defined
0
12
classes_present_array
An array of classes present in the image.
Producer defined
Values present in image from classes
pen_width
Final width in pixels of pen used to annotate in the Doodler program.
Producer defined
1
10
CRF_theta
Internal classifier hyperparameter used by the Doodler program.
Producer defined
1
1
CRF_mu
Internal classifier hyperparameter used by the Doodler program.
Producer defined
1
99
CRF_downsample_factor
Internal classifier hyperparameter used by the Doodler program.
Producer defined
1
5
Classifier_downsample_factor
Internal classifier hyperparameter used by the Doodler program.
Producer defined
1
8
prob_of_unary_potential
Internal classifier hyperparameter used by the Doodler program.
Producer defined
0.1
3.0
doodle_spatial_density
Proportion of the image annotated.
Producer defined
0.000526323
0.999627422
num_of_scales
Internal classifier hyperparameter used by the Doodler program.
Producer defined
3
3
acc_georef
Accuracy, in meters, of the specification of 'XMin', 'XMax' and 'YMin', 'YMax'.
Producer defined
1
11.248
epsg
EPSG code for the projected coordinate system. See 'CRS' attribute for a complete description of codes used.
Producer defined
26910
32618
year
Acquisition year of the image source.
Producer defined
2008
2021
month
Acquisition month of the image source.
Producer defined
1
12
day
Acquisition day of the image source.
Producer defined
1
31
hour
Acquisition hour of the image source.
Producer defined
0
23
minute
Acquisition minute of the image source.
Producer defined
0
59
second
Acquisition second of the image source.
Producer defined
0
59
XMin
Minimum easting of the image footprint.
Producer defined
233870.0
787860.0
XMax
Maximum easting of the image footprint.
Producer defined
235750.0
790530.0
YMin
Minimum northing of the image footprint.
Producer defined
2875253.0
5332914.0
YMax
Maximum northing of the image footprint.
Producer defined
2884030.0
5333378.0
LonMin
Minimum longitude (WGS84) of image footprint.
Producer defined
-124.0922272
-69.95201111
LonMax
Maximum longitude (WGS84) of image footprint.
Producer defined
-124.0478924
-69.9405098
LatMin
Minimum longitude (WGS84) of image footprint.
Producer defined
25.98761753
48.14810677
LatMax
Maximum latitude (WGS84) of image footprint.
Producer defined
26.06287667
48.15232107
CRS
The projected coordinate system description relating to XMin, XMax, YMin, YMax.
Producer defined
Projected coordinate system definition
px_m
Horizontal size of pixel in meters.
Producer defined
0.15
15
ImageHeightPx
Number of pixels in horizontal dimension of height.
Producer defined
31
2481
ImageWidthPx
Number of pixels in horizontal dimension of width.
Producer defined
32
2209
ImageBands
Number of bands in the image.
Producer defined
3
3
Each image, image annotation, and labelled image is available as a single NPZ zipped file. NPZ files follow the following naming convention: {datasource}_{numberofclasses}_{threedigitdatasetversion}.zip, where {datasource} is the source of the original images (for example, NAIP, Landsat 8, Sentinel 2), {numberofclasses} is the number of classes used to annotate the images, and {threedigitdatasetversion} is the three digit code corresponding to the dataset version (in other words, 001 is version 1). Each zipped folder contains a collection of NPZ format files, each of which corresponds to an induvial image. An individual NPZ file is named after the image that it represents and contains (1) a CSV file with metadata information for every image and (2) a collection of the following NPY files: orig_image.npy (original input image unedited), image.npy (original input image after color balancing and normalization), classes.npy (list of classes annotated and present in the labelled image), doodles.npy (integer image of all image annotations), color_doodles.npy (color image of doodles.npy), label.npy (labelled image created from the classes present in the annotations), and settings.npy (annotation and machine learning settings used to generate the labelled image from annotations). All NPZ files can be extracted using the utilities available in Doodler (Buscombe, 2022; https://doi.org/10.5066/P9YVHL23).
The entity and attribute information was generated by the individual and/or agency identified as the originator of the data set. Please review the rest of the metadata record for additional details and information.
U.S. Geological Survey - CMGDS
mailing and physical
2885 Mission Street
Santa Cruz
CA
95060
831-427-4747
pcmsc_data@usgs.gov
Images are provided in NPZ format. Each NPZ file corresponds to a single image that has been annotated/labelled and classified/segmented. The NPZ file names consist of the {image_source}_{number_of_classes}_{data_release_version_number}, delimited by underscores. The first element {image_source} represents the original source of the image (for example, Landsat 8 would be “L8”, Sentinel 2 would be “S2”), the second element {number_of_classes} represents the number of classes used during labelling (for example, “6 classes”, “11classes”, and The third element {data_release_version_number} represents the data release version that the image is part of (for example, all datasets for version 1 will have “001” as the third part of the NPZ filename). Each NPZ file contains at least seven (7) different NPY files: (1) orig_image.npy (original input image unedited), (2) image.npy (original input image after color balancing and normalization), (3) classes.npy (list of classes annotated and present in the labelled image), (4) doodles.npy (integer image of all image annotations), (5) color_doodles.npy (color image of doodles.npy), (6) label.npy (labelled image created from the classes present in the annotations), and (7) settings.npy (annotation and machine learning settings used to generate the labelled image from annotations). Some NPZ files may contain one or more additional sets of seven files with one or more zeros appended to the beginning of the NPY file names. These additional files are grouped by the number of zeros preceding the regular files described above and represent previous attempts at annotation and classification/segmentation for that image. For example, all NPY files with one zero appended to the beginning of the NPY file names represent the first attempt, all NPY files with two zeros appended to the beginning of the NPY file names represent the second attempt, etc. A merged CSV file (CoastTrain_imagery_details.csv) contains detailed information on the complete imagery collection.
Unless otherwise stated, all data, metadata and related materials are considered to satisfy the quality standards relative to the purpose for which the data were collected. Although these data and associated metadata have been reviewed for accuracy and completeness and approved for release by the U.S. Geological Survey (USGS), no warranty expressed or implied is made regarding the display or utility of the data on any other system or for general or scientific purposes, nor shall the act of distribution constitute any such warranty.
comma-delimited text
none
https://doi.org/10.5066/P91NP87I
CSV files can be downloaded using the Network_Resource_Name link then scrolling down to the Land Cover Data section.
NPY
Each ZIP file contains multiple NPZ files. Individual NPZ files can be unzipped and all NPY files converted to readable raster formats by using utility scripts associated with Doodler (Buscombe, 2022; https://doi.org/10.5066/P9YVHL23). NPZ files were compressed using deflate compression.
https://doi.org/10.5066/P91NP87I
Imagery data can be downloaded using the Network_Resource_Name link then scrolling down to the Land Cover Data section.
None.
These data can be viewed with Doodler software (Buscombe, 2022; https://doi.org/10.5066/P9YVHL23).
20220504
U.S. Geological Survey, Pacific Coastal and Marine Science Center
PCMSC Science Data Coordinator
mailing and physical
2885 Mission Street
Santa Cruz
CA
95060
831-427-4747
pcmsc_data@usgs.gov
Content Standard for Digital Geospatial Metadata
FGDC-STD-001-1998