Labeled satellite imagery for training machine learning semantic segmentation models of coastal shorelines.

Metadata also available as - [Questions & Answers] - [Parseable text] - [XML]

Metadata:

Identification_Information:
Citation:
Citation_Information:
Originator: Daniel Buscombe
Originator: Mark A. Lundine
Originator: Catherine N. Janda
Originator: Sharon Batiste
Publication_Date: 20250325
Title:
Labeled satellite imagery for training machine learning semantic segmentation models of coastal shorelines.
Geospatial_Data_Presentation_Form: JPEG
Series_Information:
Series_Name: data release
Issue_Identification: DOI:10.5066/P13EOBZQ
Publication_Information:
Publication_Place:
Pacific Coastal and Marine Science Center, Santa Cruz, California
Publisher: U.S. Geological Survey
Other_Citation_Details:
Suggested Citation: Buscombe, D., Lundine, M.A., Janda, C.N., and Batiste, S., 2025, Labeled satellite imagery for training machine learning semantic segmentation models of coastal shorelines, U.S. Geological Survey data release, https://doi.org/10.5066/P13EOBZQ.
Online_Linkage: https://doi.org/10.5066/P13EOBZQ
Description:
Abstract:
A dataset of Landsat, Sentinel, and Planetscope satellite images of coastal shoreline regions, and corresponding semantic segmentations. The dataset consists of folders of images and label images. Label images are images where each pixel is given a discrete class by a human annotator, among the following classes: a) water, b) whitewater/surf, c) sediment, and d) other. These data are intended only to be used as a training and validation dataset for a machine learning based image segmentation model that is specifically designed for the task of coastal shoreline satellite image semantic segmentation.
Purpose:
These data provide resources for automatically detected coastal shoreline position for resource managers, science researchers, students, and the general public. These data can be used with image viewing software and can be used within specialist software for the purposes of training Machine Learning models to segment imagery into water, whitewater/surf, sediment, and 'other' for the purposes of shoreline mapping. Other potential uses of such data include mapping sand and whitewater/surf from geospatial imagery. The imagery are organized into three folders, which constitute three separate versions of the dataset. One folder contains visible-spectrum or RGB (Red-Green-Blue) imagery and corresponding label images. The second folder contains NDWI (Normalized Difference Water Index) imagery and associated labels. The third folder contains MNDWI (Modified Normalized Difference Water Index) imagery and associated labels. MNDWI and NDWI spectral index imagery are commonly used for water and shoreline detection. These data are used to train a Machine Learning model to carry out the task of semantic image segmentation. Labels were created using the image annotation tool, Doodler (Buscombe, 2022; Buscombe and others, 2022).
Supplemental_Information:
This data release was funded by the USGS Coastal and Marine Hazards and Resources Program. Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government. This data release contains modified Planetscope imagery, provided under the NASA (National Aeronautics and Space Administration) CSDA (Commercial Satellite Data Aquisition) program under the standard Scientific Use License available at https://cdn.earthdata.nasa.gov/conduit/upload/14226/PlanetEULA042220.pdf, and the End User license agreement available at https://earthdata.nasa.gov/s3fs-public/2022-02/Planet_Expanded_EULA_06-21.pdf. This license permits redistribution of imagery in significantly modified form. We provide only the visible (R, G, and B) bands of small sub-portions of downloaded tiles, in png format. As such, the original imagery (multispectral scenes in geotiff format) would have been cropped, its geospatial information removed, and re-encoded into 8bit png format.
Time_Period_of_Content:
Time_Period_Information:
Range_of_Dates/Times:
Beginning_Date: 1984
Ending_Date: 2024
Currentness_Reference: collection years of satellite imagery.
Status:
Progress: Complete
Maintenance_and_Update_Frequency: None Planned
Spatial_Domain:
Bounding_Coordinates:
West_Bounding_Coordinate: 180.00000
East_Bounding_Coordinate: -180.00000
North_Bounding_Coordinate: 90.00000
South_Bounding_Coordinate: -90.00000
Keywords:
Theme:
Theme_Keyword_Thesaurus: USGS Metadata Identifier
Theme_Keyword: USGS:3f0efdeb-7bad-45f8-bbbe-7f70c78929e0
Theme:
Theme_Keyword_Thesaurus: Global Change Master Directory
Theme_Keyword: Hazards Planning
Theme_Keyword: Ocean Waves
Theme_Keyword: Erosion
Theme_Keyword: Sea Level Rise
Theme_Keyword: Extreme Weather
Theme:
Theme_Keyword_Thesaurus: ISO 19115 Topic Category
Theme_Keyword: Oceans
Theme_Keyword: ClimatologyMeteorologyAtmosphere
Theme:
Theme_Keyword_Thesaurus: Data Categories for Marine Planning
Theme_Keyword: Physical Habitats and Geomorphology
Theme:
Theme_Keyword_Thesaurus: USGS Thesaurus
Theme_Keyword: Climate Change
Theme_Keyword: Storms
Theme_Keyword: Sea-level Change
Theme:
Theme_Keyword_Thesaurus: Marine Realms Information Bank (MRIB) keywords
Theme_Keyword: sea level change
Theme_Keyword: waves
Theme_Keyword: coastal erosion
Theme:
Theme_Keyword_Thesaurus: None
Theme_Keyword: U.S. Geological Survey
Theme_Keyword: USGS
Theme_Keyword: Coastal and Marine Hazards and Resources Program
Theme_Keyword: CMHRP
Theme_Keyword: Pacific Coastal and Marine Science Center
Theme_Keyword: PCMSC
Access_Constraints: No access constraints
Use_Constraints:
USGS-authored or produced data and information are in the public domain from the U.S. Government and are freely redistributable with proper metadata and source attribution. Please recognize and acknowledge the U.S. Geological Survey as the originator(s) of the dataset and in products derived from these data. Users are advised to read the rest of the metadata record carefully for additional details.
Point_of_Contact:
Contact_Information:
Contact_Organization_Primary:
Contact_Organization:
U.S. Geological Survey, Pacific Coastal and Marine Science Center
Contact_Person: PCMSC Science Data Coordinator
Contact_Address:
Address_Type: mailing and physical
Address: 2885 Mission Street
City: Santa Cruz
State_or_Province: CA
Postal_Code: 95060
Contact_Voice_Telephone: 831-427-4747
Contact_Electronic_Mail_Address: pcmsc_data@usgs.gov
Data_Set_Credit:
This data release was funded by the USGS Coastal and Marine Hazards and Resources Program.
Native_Data_Set_Environment:
The datasets were created in a Windows 11 Operating system, python 3.10. Results were output and saved in JPEG format.
Cross_Reference:
Citation_Information:
Originator: Daniel D. Buscombe
Publication_Date: 2022
Title:
Doodler--A web application built with plotly/dash for image segmentation with minimal supervision
Geospatial_Data_Presentation_Form: software
Series_Information:
Series_Name: software release
Issue_Identification: DOI:10.5066/P9YVHL23
Publication_Information:
Publication_Place:
Pacific Coastal and Marine Science Center, Santa Cruz, California
Publisher: U.S. Geological Survey
Other_Citation_Details:
Buscombe, D.D., 2022, Doodler--A web application built with plotly/dash for image segmentation with minimal supervision: U.S. Geological Survey software release, https://doi.org/10.5066/P9YVHL23
Online_Linkage: https://doi.org/10.5066/P9YVHL23
Cross_Reference:
Citation_Information:
Originator: Daniel D. Buscombe
Originator: Evan B. Goldstein
Originator: Chris R. Sherwood
Originator: Cameron Bodine
Originator: Jenna A. Brown
Originator: Jaycee Favela
Originator: Sharon Fitzpatrick
Originator: Christine J. Kranenburg
Originator: Jin-Si R. Over
Originator: Andrew C. Ritchie
Originator: Jonathan A. Warrick
Publication_Date: 2022
Title: Human‐in‐the‐loop segmentation of Earth surface imagery
Other_Citation_Details:
Buscombe, D., Goldstein, E.B., Sherwood, C.R., Bodine, C., Brown, J.A., Favela, J., Fitzpatrick, S., Kranenburg, C.J., Over, J.R., Ritchie, A.C. and Warrick, J.A., 2022. Human‐in‐the‐loop segmentation of Earth surface imagery. Earth and Space Science, 9(3), p.e2021EA002085.
Online_Linkage: https://doi.org/10.1029/2021EA002085
Cross_Reference:
Citation_Information:
Originator: Buscombe, D.
Originator: Goldstein, E.B.
Publication_Date: 2022
Title:
A reproducible and reusable pipeline for segmentation of geoscientific imagery
Other_Citation_Details:
Buscombe, D., and Goldstein, E.B, 2022, A reproducible and reusable pipeline for segmentation of geoscientific imagery. Journal of Open Source Software, 9(99), 6683
Online_Linkage: https://doi.org/10.1029/2022EA002332
Cross_Reference:
Citation_Information:
Originator: Fitzpatrick, S.
Originator: Buscombe, D.
Originator: Warrick, J.A.
Originator: Lundine, M.A.
Originator: Vos, K.
Publication_Date: 2024
Title:
Sub-annual to multi-decadal shoreline variability from publicly available satellite imagery
Other_Citation_Details:
Fitzpatrick, S., Buscombe, D., Warrick, J.A., Lundine, M.A., and Vos, K., 2024, CoastSeg: an accessible and extendable hub for satellite-derived-shoreline (SDS) detection and mapping. Journal of Open Source Software, 9(99), 6683
Online_Linkage: https://doi.org/10.21105/joss.06683
Cross_Reference:
Citation_Information:
Originator: Vos, K.
Originator: Harley, M.D.
Originator: Splinter, K.D.
Originator: Simmons, J.A.
Originator: Turner, I.L.
Publication_Date: 2019
Title:
Sub-annual to multi-decadal shoreline variability from publicly available satellite imagery
Other_Citation_Details:
Vos, K., Harley, M.D., Splinter, K.D., Simmons, J.A., and Turner, I.L., 2019, Sub-annual to multi-decadal shoreline variability from publicly available satellite imagery: Coastal Engineering, v. 150, p. 160-174.
Online_Linkage: https://doi.org/10.1016/j.coastaleng.2019.04.004
Cross_Reference:
Citation_Information:
Originator: Gorelick, N.
Originator: Hancher, M.
Originator: Dixon, M.
Originator: Ilyshechenko, S.
Originator: Thau, D.
Originator: Moore, R.
Publication_Date: 2017
Title:
Google Earth Engine: Planetary-scale geospatial analysis for everyone.
Other_Citation_Details:
Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., and Moore, R., 2017, Google Earth Engine: Planetary-scale geospatial analysis for everyone: Remote Sensing of Environment, v. 202, p. 18-27.
Online_Linkage: https://doi.org/10.1016/j.rse.2017.06.031
Data_Quality_Information:
Logical_Consistency_Report:
Data have undergone QA/QC and fall within expected/reasonable ranges.
Completeness_Report:
Data set is considered complete for the information presented. Users are advised to read the rest of the metadata record carefully for additional details.
Lineage:
Source_Information:
Source_Citation:
Citation_Information:
Originator: U.S. Geological Survey
Publication_Date: 2025
Title: Landsat imagery (from Landsat 8-9)
Geospatial_Data_Presentation_Form: PNG image
Publication_Information:
Publication_Place: online
Publisher: U.S. Geological Survey
Online_Linkage: https://doi.org/10.5066/P9OGBGM6
Type_of_Source_Media: online database
Source_Time_Period_of_Content:
Time_Period_Information:
Range_of_Dates/Times:
Beginning_Date: 19840101
Ending_Date: 20241231
Source_Currentness_Reference: collection years of satellite imagery
Source_Citation_Abbreviation: Landsat imagery
Source_Contribution:
The archive of Landsat 7 satellite imagery was accessed through Google Earth Engine.
Source_Information:
Source_Citation:
Citation_Information:
Originator: U.S. Geological Survey
Publication_Date: 2025
Title: Landsat imagery (from Landsat 7)
Geospatial_Data_Presentation_Form: PNG image
Publication_Information:
Publication_Place: online
Publisher: U.S. Geological Survey
Online_Linkage: https://doi.org/10.5066/P9C7I13B.
Type_of_Source_Media: online database
Source_Time_Period_of_Content:
Time_Period_Information:
Range_of_Dates/Times:
Beginning_Date: 19840101
Ending_Date: 20241231
Source_Currentness_Reference: collection years of satellite imagery
Source_Citation_Abbreviation: Landsat imagery
Source_Contribution:
The archive of Landsat 7 satellite imagery was accessed through Google Earth Engine.
Source_Information:
Source_Citation:
Citation_Information:
Originator: U.S. Geological Survey
Publication_Date: 2025
Title: Landsat imagery (from Landsat 5)
Geospatial_Data_Presentation_Form: PNG image
Publication_Information:
Publication_Place: online
Publisher: U.S. Geological Survey
Online_Linkage: https://doi.org/10.5066/P9IAXOVV
Type_of_Source_Media: online database
Source_Time_Period_of_Content:
Time_Period_Information:
Range_of_Dates/Times:
Beginning_Date: 19840101
Ending_Date: 20241231
Source_Currentness_Reference: collection years of satellite imagery
Source_Citation_Abbreviation: Landsat imagery
Source_Contribution:
The archive of Landsat 5 satellite imagery was accessed through Google Earth Engine.
Source_Information:
Source_Citation:
Citation_Information:
Originator: Planet Labs PBC
Publication_Date: 2025
Title: PlanetScope imagery
Geospatial_Data_Presentation_Form: JPEG image
Publication_Information:
Publication_Place: online
Publisher: Planet Labs PBC
Type_of_Source_Media: online database
Source_Time_Period_of_Content:
Time_Period_Information:
Range_of_Dates/Times:
Beginning_Date: 20150101
Ending_Date: 20241231
Source_Currentness_Reference: collection years of satellite imagery
Source_Citation_Abbreviation: PlanetScope imagery
Source_Contribution:
The archive of PlanetScope satellite imagery was accessed through the PlanetScope Application Programming Interface.
Process_Step:
Process_Description:
Set up CoastSeg toolbox (Fitzpatrick and others, 2024) for implementation along the region of interest. Toolbox set up in python 3.10 to run for geography spanning coastline for numerous worldwide locations, for the time period of 01 March 1984 to 31 December 2024. Visible-band (RGB) images were constructed by concatenating the Red, Green, and Blue bands and saving as JPEG format. NDWI images were computed using the Near-Infrared (NIR) and Green bands of the original multispectral imagery, and saved in JPEG format. MNDWI images were computed using the Shortwave-Infrared (SWIR) and Green bands of the original multispectral imagery, and saved in JPEG format. Images were then labeled using Doodler (Buscombe, 2022; Buscombe and others, 2022). MNDWI images were not computed for PlanetScope imagery because there is no Shortwave-Infrared (SWIR) band available.
Process_Date: 20240701
Process_Step:
Process_Description:
Ran CoastSeg toolbox (Fitzpatrick and others, 2024) on Landsat and Sentinel-2A imagery available through Google Earth Engine (Gorelick and others, 2017) for geography and time period of interest. PlanetScope imagery was downloaded using CoastSeg from the Planet Application Programming Interface. Imagery had horizontal resolution of between 4 and 30 m depending on source. Imagery with an original horizontal resolution of 30m was pan-sharpened to 15 m. The geospatial information has been removed; it is not necessary for the intended purpose of training Machine Learning models to discriminate among suitable and unsuitable imagery. Only the red, blue, and green channels have been extracted from the original multispectral imagery and, after pansharpening, these three bands are saved as a png format image. Images were labeled using the Doodler software program (Buscombe and others, 2022). The classes used to label imagery were: a) water, b) whitewater (surf), c) sediment, d) other. Label images are such that each pixel represents a different class; the integer 0 is used for the water class, the integer 1 is used for the whitewater (surf) class, the integer 2 is used for the sediment class, and finally the integer 3 is used for the 'other' class, which is everything not covered by the previous 3 classes.
Source_Used_Citation_Abbreviation: Sentinel imagery
Source_Used_Citation_Abbreviation: Landsat imagery
Source_Used_Citation_Abbreviation: PlanetScope imagery
Process_Date: 20240801
Process_Step:
Process_Description: Checked output to ensure quality results.
Process_Date: 20240801
Process_Step:
Process_Description:
Organized image data into folders of images and associated label images in JPEG format. Originating imagery dates/times are included in files. No positions are encoded in the files because this data are intended solely to train a Machine Learning model to identify imagery suitable for shoreline analysis (such as, imagery in which the shoreline is visible to the human eye).
Process_Date: 20240801
Spatial_Data_Organization_Information:
Indirect_Spatial_Reference:
Data were generated within a numerical model scheme. The model training data presented are not for a particular geographic area.
Entity_and_Attribute_Information:
Detailed_Description:
Entity_Type:
Entity_Type_Label:
Each image name contains a string that includes the date and time, as well as the name of the sensor. PS denotes Plantscope imagery. S2 denotes Sentinel 2A imagery. L5 denotes Landsat 5. L7 denotes Landsat 7. L8 denotes Landsat 8. L9 denotes Landsat 9. Date and time of the projected data (UTC) are in yyy-mm-dd hh:MM:SS format (where yyyy is 4 digit year, mm is 2-digit month, dd is 2-digit day, hh is 2-digit hour in 24-hour format, MM is 2-digit minutes, and SS is 2-digit seconds). For example, the file name ‘formodel_dataset2_Duck_2014-02-28-15-41-42_L8.jpg’ is for an RGB image from Landsat 8 collected on 2014-02-28 at 15:41:42. File names ending with ‘label’ indicate images containing labeled classes.
Entity_Type_Definition: JPEG files of RGB imagery and their associated labeled imagery
Entity_Type_Definition_Source: Producer Defined
Attribute:
Attribute_Label: Label
Attribute_Definition: Image pixel class labeled using Doodler (Buscombe 2022)
Attribute_Definition_Source: Producer defined
Attribute_Domain_Values:
Enumerated_Domain:
Enumerated_Domain_Value: 0
Enumerated_Domain_Value_Definition: Integer pixel value used for the water class
Enumerated_Domain_Value_Definition_Source: Producer defined
Attribute_Domain_Values:
Enumerated_Domain:
Enumerated_Domain_Value: 1
Enumerated_Domain_Value_Definition: Integer pixel value used for the whitewater (surf) class
Enumerated_Domain_Value_Definition_Source: Producer defined
Attribute_Domain_Values:
Enumerated_Domain:
Enumerated_Domain_Value: 2
Enumerated_Domain_Value_Definition: Integer pixel value used for the sediment class
Enumerated_Domain_Value_Definition_Source: Producer defined
Attribute_Domain_Values:
Enumerated_Domain:
Enumerated_Domain_Value: 3
Enumerated_Domain_Value_Definition:
Integer pixel value used for the ‘other’ class, which is everything not covered by the previous three classes
Enumerated_Domain_Value_Definition_Source: Producer defined
Overview_Description:
Entity_and_Attribute_Overview:
These data are designed to train and test a Machine Learning model that is tasked with semantic segmentation of imagery into 4 discrete classes, a) water, b) whitewater/surf, c) sediment, and d) other, for the purposes of shoreline detection. These images have been labeled using Doodler (Buscombe and others, 2022). No positions are encoded in the files because these data are intended solely to train a Machine Learning model to identify imagery suitable for shoreline analysis (such as, imagery in which the shoreline is visible to the human eye). These images and associated labels are designed to be used to create a segmentation model following the methods of Buscombe and Goldstein (2022). This model would then carry out semantic satellite image segmentation for the principal purpose of finding water and sand pixels, so the shoreline can be identified automatically from satellite imagery. The imagery have been organized into 3 zipped folders: The zipped folder images_and_labels_RGB.zip contains RGB imagery and associated labels. There are separate subsets for Alaska-only (images_AK_RGB.zip and labels_AK_RGB.zip), all locations (images_ALL_RGB.zip and labels_ALL_RGB.zip), non-Alaska locations (images_nonAK_RGB.zip and labels_nonAK_RGB.zip) and finally in the vicinity of Duck, North Carolina (images_Duck_RGB.zip and labels_Duck_RGB.zip). The zipped folder images_and_labels_NDWI.zip contains NDWI imagery and associated labels. There are separate subsets for Alaska-only (images_AK_NDWI.zip and labels_AK_NDWI.zip) and all locations (images_ALL_NDWI.zip and labels_ALL_NDWI.zip). The zipped folder images_and_labels_MNDWI.zip contains MNDWI imagery and associated labels. There are separate subsets for Alaska-only (images_AK_MNDWI.zip and labels_AK_MNDWI.zip) and all locations (images_ALL_MNDWI.zip and labels_ALL_MNDWI.zip).
Entity_and_Attribute_Detail_Citation: U.S. Geological Survey
Distribution_Information:
Distributor:
Contact_Information:
Contact_Organization_Primary:
Contact_Organization: U.S. Geological Survey - CMGDS
Contact_Address:
Address_Type: mailing and physical
Address: 2885 Mission Street
City: Santa Cruz
State_or_Province: CA
Postal_Code: 95060
Contact_Voice_Telephone: 831-427-4747
Contact_Electronic_Mail_Address: pcmsc_data@usgs.gov
Resource_Description:
Image data are organized into folders of images and associated label images in JPEG format. These data are ready to be ingested into a deep learning or machine learning model training pipeline (for example, Buscombe and Goldstein, 2022), using software such as Tensorflow or Keras or Pytorch.
Distribution_Liability:
Unless otherwise stated, all data, metadata and related materials are considered to satisfy the quality standards relative to the purpose for which the data were collected. Although these data and associated metadata have been reviewed for accuracy and completeness and approved for release by the U.S. Geological Survey (USGS), no warranty expressed or implied is made regarding the display or utility of the data on any other system or for general or scientific purposes, nor shall the act of distribution constitute any such warranty.
Standard_Order_Process:
Digital_Form:
Digital_Transfer_Information:
Format_Name: JPEG
Format_Information_Content: images
Transfer_Size: 834.3
Digital_Transfer_Option:
Online_Option:
Computer_Contact_Information:
Network_Address:
Network_Resource_Name: https://doi.org/10.5066/P13EOBZQ
Access_Instructions:
Data can be downloaded using the Network_Resource_Name link then scrolling down to the Imagery Data section.
Fees: None.
Technical_Prerequisites:
These data can be viewed with image (picture) viewing software or numerical processing software such as python or Matlab.
Metadata_Reference_Information:
Metadata_Date: 20250325
Metadata_Contact:
Contact_Information:
Contact_Organization_Primary:
Contact_Organization:
U.S. Geological Survey, Pacific Coastal and Marine Science Center
Contact_Person: PCMSC Science Data Coordinator
Contact_Address:
Address_Type: mailing and physical
Address: 2885 Mission Street
City: Santa Cruz
State_or_Province: CA
Postal_Code: 95060
Contact_Voice_Telephone: 831-427-4747
Contact_Electronic_Mail_Address: pcmsc_data@usgs.gov
Metadata_Standard_Name: Content Standard for Digital Geospatial Metadata
Metadata_Standard_Version: FGDC-STD-001-1998

This page is <https://cmgds.marine.usgs.gov/catalog/pcmsc/DataReleases/CMGDS_DR_tool/DR_P13EOBZQ/images_and_labels_metadata.html>
Generated by mp version 2.9.51 on Tue Apr 1 10:12:09 2025