Metadata:
  Identification_Information:
    Citation:
      Citation_Information:
        Originator: Daniel Buscombe
        Originator: Mark A. Lundine
        Originator: Sharon Batiste
        Originator: Catherine N. Janda
        Publication_Date: 20250325
        Title: Labeled satellite imagery for training machine learning models that predict the suitability of imagery for shoreline extraction.
        Geospatial_Data_Presentation_Form: JPEG
        Series_Information:
          Series_Name: data release
          Issue_Identification: DOI:10.5066/P14MDKVJ
        Publication_Information:
          Publication_Place: Pacific Coastal and Marine Science Center, Santa Cruz, California
          Publisher: U.S. Geological Survey
        Other_Citation_Details: Suggested Citation: Buscombe, D., Lundine, M.A., Batiste, S., and Janda, C.N., 2025, Labeled satellite imagery for training machine learning models that predict the suitability of imagery for shoreline extraction, U.S. Geological Survey data release, https://doi.org/10.5066/P14MDKVJ.
        Online_Linkage: https://doi.org/10.5066/P14MDKVJ
    Description:
      Abstract: A labeled dataset of Landsat, Sentinel, and Planetscope satellite visible-band images of coastal shoreline regions, consisting of folders of images that have been labeled as either suitable or unsuitable for shoreline detection using existing conventional approaches such as CoastSat (Vos and others, 2019) or CoastSeg (Fitzpatrick and others, 2024). These data are intended to be used as inputs to models that determine the suitability or otherwise of the image. These data are only to be used as a training and validation dataset for a machine learning model that is specifically designed for the task of determining the suitability of an image for the task of estimating the shoreline location.
      Purpose: These data provide resources for automatically detected coastal shoreline position for resource managers, science researchers, students, and the general public. These data can be used with image viewing software and can be used within specialist software for the purposes of training Machine Learning models to identify suitable and unsuitable imagery for the purposes of shoreline mapping. The imagery are organized into two folders; those for training and those for testing a Machine Learning model.
      Supplemental_Information: This data release was funded by the USGS Coastal and Marine Hazards and Resources Program. Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government. This data release contains modified Planetscope imagery, provided under the NASA (National Aeronautics and Space Administration) CSDA (Commercial Satellite Data Aquisition) program under the standard Scientific Use License available at https://cdn.earthdata.nasa.gov/conduit/upload/14226/PlanetEULA042220.pdf, and the End User license agreement available at https://earthdata.nasa.gov/s3fs-public/2022-02/Planet_Expanded_EULA_06-21.pdf. This license permits redistribution of imagery in significantly modified form. We provide only the visible (R, G, and B) bands of small sub-portions of downloaded tiles, in jpeg format. As such, the original imagery (multispectral scenes in GeoTIFF format) would have been cropped, its geospatial information removed, and re-encoded into 8bit JPEG format.
    Time_Period_of_Content:
      Time_Period_Information:
        Range_of_Dates/Times:
          Beginning_Date: 1984
          Ending_Date: 2024
      Currentness_Reference: collection years of satellite imagery.
    Status:
      Progress: Complete
      Maintenance_and_Update_Frequency: None Planned
    Spatial_Domain:
      Bounding_Coordinates:
        West_Bounding_Coordinate: 180.00000
        East_Bounding_Coordinate: -180.00000
        North_Bounding_Coordinate: 90.00000
        South_Bounding_Coordinate: -90.00000
    Keywords:
      Theme:
        Theme_Keyword_Thesaurus: USGS Metadata Identifier
        Theme_Keyword: USGS:799a4c57-8bcd-40a9-91ec-26dc0c8b9be5
      Theme:
        Theme_Keyword_Thesaurus: Global Change Master Directory
        Theme_Keyword: Hazards Planning
        Theme_Keyword: Ocean Waves
        Theme_Keyword: Erosion
        Theme_Keyword: Sea Level Rise
        Theme_Keyword: Extreme Weather
      Theme:
        Theme_Keyword_Thesaurus: ISO 19115 Topic Category
        Theme_Keyword: Oceans
        Theme_Keyword: ClimatologyMeteorologyAtmosphere
      Theme:
        Theme_Keyword_Thesaurus: Data Categories for Marine Planning
        Theme_Keyword: Physical Habitats and Geomorphology
      Theme:
        Theme_Keyword_Thesaurus: USGS Thesaurus
        Theme_Keyword: Climate Change
        Theme_Keyword: Storms
        Theme_Keyword: Sea-level Change
      Theme:
        Theme_Keyword_Thesaurus: Marine Realms Information Bank (MRIB) keywords
        Theme_Keyword: sea level change
        Theme_Keyword: waves
        Theme_Keyword: coastal erosion
      Theme:
        Theme_Keyword_Thesaurus: None
        Theme_Keyword: U.S. Geological Survey
        Theme_Keyword: USGS
        Theme_Keyword: Coastal and Marine Hazards and Resources Program
        Theme_Keyword: CMHRP
        Theme_Keyword: Pacific Coastal and Marine Science Center
        Theme_Keyword: PCMSC
    Access_Constraints: No access constraints
    Use_Constraints: USGS-authored or produced data and information are in the public domain from the U.S. Government and are freely redistributable with proper metadata and source attribution. Please recognize and acknowledge the U.S. Geological Survey as the originator(s) of the dataset and in products derived from these data.
    Point_of_Contact:
      Contact_Information:
        Contact_Organization_Primary:
          Contact_Organization: U.S. Geological Survey, Pacific Coastal and Marine Science Center
          Contact_Person: PCMSC Science Data Coordinator
        Contact_Address:
          Address_Type: mailing and physical
          Address: 2885 Mission Street
          City: Santa Cruz
          State_or_Province: CA
          Postal_Code: 95060
        Contact_Voice_Telephone: 831-427-4747
        Contact_Electronic_Mail_Address: pcmsc_data@usgs.gov
    Browse_Graphic:
      Browse_Graphic_File_Name: Global_map_of_image_locations.png
      Browse_Graphic_File_Description: Image map showing locations of satellite imagery.
      Browse_Graphic_File_Type: PNG
    Data_Set_Credit: This data release was funded by the USGS Coastal and Marine Hazards and Resources Program.
    Native_Data_Set_Environment: The datasets were created in a Windows 11 Operating system, python 3.10. Results were output and saved in JPEG format.
    Cross_Reference:
      Citation_Information:
        Originator: Fitzpatrick, S.
        Originator: Buscombe, D.
        Originator: Warrick, J.A.
        Originator: Lundine, M.A.
        Originator: Vos, K.
        Publication_Date: 2024
        Title: Sub-annual to multi-decadal shoreline variability from publicly available satellite imagery
        Other_Citation_Details: Fitzpatrick, S., Buscombe, D., Warrick, J.A., Lundine, M.A., and Vos, K., 2024, CoastSeg: an accessible and extendable hub for satellite-derived-shoreline (SDS) detection and mapping. Journal of Open Source Software, 9(99), 6683
        Online_Linkage: https://doi.org/10.21105/joss.06683
    Cross_Reference:
      Citation_Information:
        Originator: Vos, K.
        Originator: Harley, M.D.
        Originator: Splinter, K.D.
        Originator: Simmons, J.A.
        Originator: Turner, I.L.
        Publication_Date: 2019
        Title: Sub-annual to multi-decadal shoreline variability from publicly available satellite imagery
        Other_Citation_Details: Vos, K., Harley, M.D., Splinter, K.D., Simmons, J.A., and Turner, I.L., 2019, Sub-annual to multi-decadal shoreline variability from publicly available satellite imagery: Coastal Engineering, v. 150, p. 160-174.
        Online_Linkage: https://doi.org/10.1016/j.coastaleng.2019.04.004
    Cross_Reference:
      Citation_Information:
        Originator: Gorelick, N.
        Originator: Hancher, M.
        Originator: Dixon, M.
        Originator: Ilyshechenko, S.
        Originator: Thau, D.
        Originator: Moore, R.
        Publication_Date: 2017
        Title: Google Earth Engine: Planetary-scale geospatial analysis for everyone.
        Other_Citation_Details: Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., and Moore, R., 2017, Google Earth Engine: Planetary-scale geospatial analysis for everyone: Remote Sensing of Environment, v. 202, p. 18-27.
        Online_Linkage: https://doi.org/10.1016/j.rse.2017.06.031
  Data_Quality_Information:
    Logical_Consistency_Report: Data have undergone QA/QC and fall within expected/reasonable ranges.
    Completeness_Report: Data set is considered complete for the information presented. Users are advised to read the rest of the metadata record carefully for additional details.
    Lineage:
      Source_Information:
        Source_Citation:
          Citation_Information:
            Originator: U.S. Geological Survey
            Publication_Date: 2025
            Title: Landsat imagery (from Landsat 8-9)
            Geospatial_Data_Presentation_Form: PNG image
            Publication_Information:
              Publication_Place: online
              Publisher: U.S. Geological Survey
            Online_Linkage: https://doi.org/10.5066/P9OGBGM6
        Type_of_Source_Media: online database
        Source_Time_Period_of_Content:
          Time_Period_Information:
            Range_of_Dates/Times:
              Beginning_Date: 19840101
              Ending_Date: 20241231
          Source_Currentness_Reference: collection years of satellite imagery
        Source_Citation_Abbreviation: Landsat imagery
        Source_Contribution: The archive of Landsat 7 satellite imagery was accessed through Google Earth Engine.
      Source_Information:
        Source_Citation:
          Citation_Information:
            Originator: U.S. Geological Survey
            Publication_Date: 2025
            Title: Landsat imagery (from Landsat 7)
            Geospatial_Data_Presentation_Form: PNG image
            Publication_Information:
              Publication_Place: online
              Publisher: U.S. Geological Survey
            Online_Linkage: https://doi.org/10.5066/P9C7I13B.
        Type_of_Source_Media: online database
        Source_Time_Period_of_Content:
          Time_Period_Information:
            Range_of_Dates/Times:
              Beginning_Date: 19840101
              Ending_Date: 20241231
          Source_Currentness_Reference: collection years of satellite imagery
        Source_Citation_Abbreviation: Landsat imagery
        Source_Contribution: The archive of Landsat 7 satellite imagery was accessed through Google Earth Engine.
      Source_Information:
        Source_Citation:
          Citation_Information:
            Originator: U.S. Geological Survey
            Publication_Date: 2025
            Title: Landsat imagery (from Landsat 5)
            Geospatial_Data_Presentation_Form: PNG image
            Publication_Information:
              Publication_Place: online
              Publisher: U.S. Geological Survey
            Online_Linkage: https://doi.org/10.5066/P9IAXOVV
        Type_of_Source_Media: online database
        Source_Time_Period_of_Content:
          Time_Period_Information:
            Range_of_Dates/Times:
              Beginning_Date: 19840101
              Ending_Date: 20241231
          Source_Currentness_Reference: collection years of satellite imagery
        Source_Citation_Abbreviation: Landsat imagery
        Source_Contribution: The archive of Landsat 5 satellite imagery was accessed through Google Earth Engine.
      Source_Information:
        Source_Citation:
          Citation_Information:
            Originator: Copernicus, a program of the European Union
            Publication_Date: 2025
            Title: Sentinel-2 imagery
            Geospatial_Data_Presentation_Form: PNG image
            Publication_Information:
              Publication_Place: online
              Publisher: Copernicus, a program of the European Union
            Online_Linkage: https://dataspace.copernicus.eu/explore-data/data-collections/sentinel-data/sentinel-2
        Type_of_Source_Media: online database
        Source_Time_Period_of_Content:
          Time_Period_Information:
            Range_of_Dates/Times:
              Beginning_Date: 19840101
              Ending_Date: 20241231
          Source_Currentness_Reference: collection years of satellite imagery
        Source_Citation_Abbreviation: Sentinel-2A imagery
        Source_Contribution: The archive of Sentinel 2A satellite imagery was accessed through Google Earth Engine.
      Source_Information:
        Source_Citation:
          Citation_Information:
            Originator: Planet Labs PBC
            Publication_Date: 2025
            Title: PlanetScope imagery
            Geospatial_Data_Presentation_Form: JPEG image
            Publication_Information:
              Publication_Place: online
              Publisher: Planet Labs PBC
        Type_of_Source_Media: online database
        Source_Time_Period_of_Content:
          Time_Period_Information:
            Range_of_Dates/Times:
              Beginning_Date: 20150101
              Ending_Date: 20241231
          Source_Currentness_Reference: collection years of satellite imagery
        Source_Citation_Abbreviation: PlanetScope imagery
        Source_Contribution: The archive of PlanetScope satellite imagery was accessed through the PlanetScope Application Programming Interface.
      Process_Step:
        Process_Description: Set up CoastSeg toolbox (Fitzpatrick and others, 2024) for implementation along the region of interest. Toolbox set up in python 3.10 to run for geography spanning coastline for numerous worldwide locations (see map), for the time period of 01 March 1984 to 31 December 2024. Images were then manually classified as either suitable or unsuitable for analysis.
        Process_Date: 20240701
      Process_Step:
        Process_Description: Ran CoastSeg toolbox on Landsat imagery available through Google Earth Engine (Gorelick and others, 2017) for geography and time period of interest. Imagery had horizontal resolution of between 4 and 30 m depending on source. Imagery with an original horizontal resolution of 30 m was pan-sharpened to 15 m. The geospatial information has been removed; it is not necessary for the intended purpose of training Machine Learning models to discriminate among suitable and unsuitable imagery. Only the red, blue, and green channels have been extracted from the original multispectral imagery and, after pansharpening, these three bands are saved as a JPEG format image.
        Source_Used_Citation_Abbreviation: Sentinel-2A imagery
        Source_Used_Citation_Abbreviation: Landsat imagery
        Source_Used_Citation_Abbreviation: PlanetScope imagery
        Process_Date: 20240801
      Process_Step:
        Process_Description: Checked output to ensure quality results.
        Process_Date: 20240801
      Process_Step:
        Process_Description: Organized image data into folders of suitable and unsuitable images. Originating imagery dates/times are included in files. No positions are encoded in the files because this data is intended solely to train a Machine Learning model to identify imagery suitable for shoreline analysis (such as imagery in which the shoreline is visible to the human eye). The imagery are organized into two folders; those for training and those for testing a Machine Learning model.
        Process_Date: 20240801
  Spatial_Data_Organization_Information:
    Indirect_Spatial_Reference: Data were generated within a numerical model scheme. The model training data presented are not for a particular geographic area.
  Entity_and_Attribute_Information:
    Overview_Description:
      Entity_and_Attribute_Overview:
        These data are designed to train and test a Machine Learning model that is tasked with recognizing suitable and unsuitable imagery for the purposes of shoreline detection. These images have been manually classified. There is one zipped folder for images used for training a Machine Learning model, called ‘train’, and another zipped folder, called ‘test’, for images used for testing that model once trained. Inside the test and train zipped folders, there are two folders of JPEG images; ‘good’ (or suitable for shoreline extraction using CoastSeg or CoastSat), or ‘bad’ (or unsuitable for shoreline extraction using CoastSeg or CoastSat). This dataset consists of visible band images, which are inputs to segmentation and other machine learning models that are used for the purposes of shoreline detection. Each image name contains a string that includes the date and time, as well as the name of the sensor. PS denotes Plantscope imagery. S2 denotes Sentinel 2A imagery. L5 denotes Landsat 5. L7 denotes Landsat 7. L8 denotes Landsat 8. L9 denotes Landsat 9. Date and time of the projected data (UTC) are in yyy-mm-dd hh:MM:SS format (where yyyy is 4 digit year, mm is 2-digit month, dd is 2-digit day, hh is 2-digit hour in 24-hour format, MM is 2-digit minutes, and SS is 2-digit seconds).  For example, the file name 'ID_spl62_datetime06-21-24__05_32_07_2017-08-27-10-56-52_RGB_S2.jpg' is for an image from Sentinel 2A collected on 2017-08-27 at 10:56:52.
        These data are ready to be ingested into a deep learning or machine learning model training pipeline, using software such as Tensorflow, Keras, or Pytorch. Images have a suffix identifier of either "ID_" or "Merbok_" which is not important for intended image use.
      Entity_and_Attribute_Detail_Citation: U.S. Geological Survey
  Distribution_Information:
    Distributor:
      Contact_Information:
        Contact_Organization_Primary:
          Contact_Organization: U.S. Geological Survey - CMGDS
        Contact_Address:
          Address_Type: mailing and physical
          Address: 2885 Mission Street
          City: Santa Cruz
          State_or_Province: CA
          Postal_Code: 95060
        Contact_Voice_Telephone: 831-427-4747
        Contact_Electronic_Mail_Address: pcmsc_data@usgs.gov
    Resource_Description: These data are available in zipped folders. There is one zipped folder for images used for training a Machine Learning model, called ‘train’, and another zipped folder for images used for testing that model once trained. Inside the test and train zipped folders, there are two folders of JPEG images; ‘good’ (or suitable for shoreline extraction using CoastSeg or CoastSat), or ‘bad’ (or unsuitable for shoreline extraction using CoastSeg or CoastSat). Each image name contains a string that includes the date and time, as well as the name of the sensor. PS denotes Plantscope imagery. S2 denotes Sentinel 2A imagery. L5 denotes Landsat 5. L7 denotes Landsat 7. L8 denotes Landsat 8. L9 denotes Landsat 9.
    Distribution_Liability: Unless otherwise stated, all data, metadata and related materials are considered to satisfy the quality standards relative to the purpose for which the data were collected. Although these data and associated metadata have been reviewed for accuracy and completeness and approved for release by the U.S. Geological Survey (USGS), no warranty expressed or implied is made regarding the display or utility of the data on any other system or for general or scientific purposes, nor shall the act of distribution constitute any such warranty.
    Standard_Order_Process:
      Digital_Form:
        Digital_Transfer_Information:
          Format_Name: JPEG
          Format_Information_Content: zip files containing images
          File_Decompression_Technique: WinZip or archive utility
          Transfer_Size: 3000
        Digital_Transfer_Option:
          Online_Option:
            Computer_Contact_Information:
              Network_Address:
                Network_Resource_Name: https://doi.org/10.5066/P14MDKVJ
            Access_Instructions: Data can be downloaded using the Network_Resource_Name link then scrolling down to the Imagery Data section.
      Fees: None.
    Technical_Prerequisites: These data can be viewed with image (picture) viewing software or numerical processing software such as python or Matlab.
  Metadata_Reference_Information:
    Metadata_Date: 20250325
    Metadata_Contact:
      Contact_Information:
        Contact_Organization_Primary:
          Contact_Organization: U.S. Geological Survey, Pacific Coastal and Marine Science Center
          Contact_Person: PCMSC Science Data Coordinator
        Contact_Address:
          Address_Type: mailing and physical
          Address: 2885 Mission Street
          City: Santa Cruz
          State_or_Province: CA
          Postal_Code: 95060
        Contact_Voice_Telephone: 831-427-4747
        Contact_Electronic_Mail_Address: pcmsc_data@usgs.gov
    Metadata_Standard_Name: Content Standard for Digital Geospatial Metadata
    Metadata_Standard_Version: FGDC-STD-001-1998