Metadata: Identification_Information: Citation: Citation_Information: Originator: Maria C. Figueroa Matias Publication_Date: 20251215 Title: Machine Learning Model: Estimates of Metal Abundance in Global Seafloor Massive Sulfide Deposits Geospatial_Data_Presentation_Form: Model Series_Information: Series_Name: Data Release Issue_Identification: DOI:10.5066/P13PYBJL Publication_Information: Publication_Place: Pacific Coastal and Marine Science Center, Santa Cruz, California Publisher: U.S. Geological Survey Other_Citation_Details: Suggested Citation: Figueroa Matias, M.C., 2025, Machine Learning Model: Estimates of Metal Abundance in Global Seafloor Massive Sulfide Deposits: U.S. Geological Survey data release, https://doi.org/10.5066/P13PYBJL. Online_Linkage: https://doi.org/10.5066/P13PYBJL Description: Abstract: A multi-stage ensembled machine learning model was developed to estimate metal abundances in seafloor massive sulfide deposits worldwide. The modeling framework integrates (1) KMeans++ clustering to identify geochemical groupings based on enrichment controls, (2) Random Forest classification to assign geochemical labels to vent fields with incomplete or absent geochemical data, and (3) XGBoost regression to generate high-fidelity predictions of metal concentrations. This USGS model application data release includes all scripts, input files, and output files necessary to apply the model to estimate concentrations of cobalt, gold, and zinc. This model is not limited by spatial boundaries and is intended for application to any oceanic location with appropriate input data. Purpose: The purpose of this data release is to provide a machine learning framework and supporting files developed to estimate cobalt, gold, and zinc concentrations in seafloor massive sulfide (SMS) deposits. The model supports efforts to better understand geochemical variability and metal enrichment in SMS systems and to improve deep-sea mineral resource assessments across diverse tectonic settings. Supplemental_Information: See SMS-MetalML_reference-list.pdf for details on all external sources used in this work. See the README.md files for additional information on the operating system and software versions used to develop this model, the directory structure, and files not listed here in the metadata. Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government. Time_Period_of_Content: Time_Period_Information: Range_of_Dates/Times: Beginning_Date: 19761122 Ending_Date: 20240621 Currentness_Reference: publication date of data used to train the model Status: Progress: Complete Maintenance_and_Update_Frequency: None Planned Spatial_Domain: Bounding_Coordinates: West_Bounding_Coordinate: -180 East_Bounding_Coordinate: -180 North_Bounding_Coordinate: 90 South_Bounding_Coordinate: -90 Keywords: Theme: Theme_Keyword_Thesaurus: USGS Thesaurus Theme_Keyword: modeling Theme_Keyword: critical minerals Theme_Keyword: mineral resources Theme_Keyword: sea-floor characteristics Theme_Keyword: marine chemistry Theme: Theme_Keyword_Thesaurus: ISO 19115 Topic Category thesaurus Theme_Keyword: geoscientificInformation Theme_Keyword: oceans Theme_Keyword: economy Theme: Theme_Keyword_Thesaurus: None Theme_Keyword: U.S. Geological Survey Theme_Keyword: USGS Theme_Keyword: Coastal and Marine Hazards and Resources Program Theme_Keyword: CMHRP Theme_Keyword: Pacific Coastal and Marine Science Center Theme_Keyword: PCMSC Theme_Keyword: machine learning Theme: Theme_Keyword_Thesaurus: USGS Metadata Identifier Theme_Keyword: USGS:67f010a4d4be02766d636810 Access_Constraints: No access constraints. Acknowledgment of the U.S. Geological Survey would be appreciated in products derived from this model application release. Use_Constraints: USGS-authored or produced data and information are in the public domain from the U.S. Government and are freely redistributable with proper metadata and source attribution. These data are licensed under CC BY 4.0 and users must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. Please recognize and acknowledge the U.S. Geological Survey as the originator(s) of the dataset and in products derived from these data. Although the information contained in the model files may be useful for other purposes, it is incumbent on the user to understand the purpose, construction, and limitations of this model. This information is not intended for navigation purposes. Point_of_Contact: Contact_Information: Contact_Organization_Primary: Contact_Organization: U.S. Geological Survey, Pacific Coastal and Marine Science Center Contact_Person: PCMSC Science Data Coordinator Contact_Address: Address_Type: mailing and physical Address: 2885 Mission Street City: Santa Cruz State_or_Province: CA Postal_Code: 95060 Contact_Voice_Telephone: 831-427-4747 Contact_Electronic_Mail_Address: pcmsc_data@usgs.gov Native_Data_Set_Environment: Windows 11 Enterprise Version 23H2, Python 3.11.4. Data_Quality_Information: Attribute_Accuracy: Attribute_Accuracy_Report: This model was developed using a machine learning approach that does not rely on traditional calibration targets. Instead, the model was trained and tested on a curated dataset of seafloor massive sulfide geochemical analyses. Accuracy was evaluated through standard machine learning methods, including cross-validation (10-fold) and performance metrics (silhouette scores, confusion matrices, root mean square error (RMSE), R squared) applied to a training and test set, split into 80/20. Logical_Consistency_Report: Data were reviewed and processed to ensure consistency and minimize bias. Elemental concentrations were standardized across sources using unit conversion, as necessary, such that major elements were reported in weight percent (wt percent), while trace elements were expressed in parts per million (ppm) or parts per billion (ppb; such as, Au). Elemental concentrations below detection limits were imputed with half the detection limit value. Samples with a high proportion of weathering or gangue material were removed by applying the thresholds: Al2O3 less than 2.5 wt percent, and Ba less than 2 wt percent, and were further filtered to include only those representative of seafloor massive sulfide material by applying the threshold: S more than 10 wt percent. Columns with more than 55 percent missing values and rows with more than 40 percent missing values were excluded. Data were also balanced across vent sites by downsampling overrepresented locations and removing statistical outliers. Completeness_Report: Dataset is considered complete for the information presented, as described in the abstract. Users are advised to read the rest of the metadata record carefully for additional details. Positional_Accuracy: Horizontal_Positional_Accuracy: Horizontal_Positional_Accuracy_Report: No formal positional accuracy tests were conducted, nor are they applicable. Vertical_Positional_Accuracy: Vertical_Positional_Accuracy_Report: No formal positional accuracy tests were conducted, nor are they applicable. Lineage: Process_Step: Process_Description: Details of processing steps contained within this release are below. For additional information, please see the relevant README.md within each folder. A multi-stage machine learning framework was developed to estimate cobalt, gold, and zinc concentrations in seafloor massive sulfide deposits. Workflow overview: (1) Cluster Analysis using K-Means++ (Folder: 1_KMeans): Forms geochemical cluster groups from a geochemical dataset (`dataset_250227.csv`). (2) Classification using Random Forest (Folder: 2_Random Forest): Assigns the geochemical clusters, from KMeans Cluster Analysis, to samples without geochemical data. (3) Regression using XGBoost (Folder: 3_XGBoost): Predicts metal concentrations using the geochemical clusters assigned from Random Forest and additional geophysical features (for example, depth, tectonic setting, spreading rate). This process was executed using Python 3.11.4, and all scripts, input files, and outputs necessary to replicate or apply the model are included in this model application data release. See the accompanying “SMS-MetalML_metadata_references.pdf” file in the Attached Files section for a full list of sources. Process_Date: 20250227 Spatial_Data_Organization_Information: Indirect_Spatial_Reference: Data were generated within a numerical model scheme. The model results presented are not for a particular geographic area. Spatial_Reference_Information: Horizontal_Coordinate_System_Definition: Geographic: Latitude_Resolution: 1e-05 Longitude_Resolution: 1e-05 Geographic_Coordinate_Units: Decimal degrees Geodetic_Model: Horizontal_Datum_Name: WGS84 Ellipsoid_Name: WGS_1984 Semi-major_Axis: 6378137.0 Denominator_of_Flattening_Ratio: 298.257 Entity_and_Attribute_Information: Detailed_Description: Entity_Type: Entity_Type_Label: SMS-MetalML.zip Entity_Type_Definition: zip folder containing all data and script files associated with the SMS machine learning model application for metal prediction. Attributes describe individual files within this folder. Entity_Type_Definition_Source: U.S. Geological Survey Attribute: Attribute_Label: SMS-MetalML/SMS-MetalML_Data_Dictionary.csv Attribute_Definition: Data dictionary containing variable names, units, descriptions, and types for the full training data file: SMS-MetalML/1_KMeans/input_files/Input_dataset_250227.csv Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: A metadata csv describing the attributes in the training data file found in SMS-MetalML/1_KMeans/input_files/Input-dataset_250227.csv. Attribute: Attribute_Label: SMS-MetalML/Readme.md Attribute_Definition: Markdown file for the overall SMS-MetalML model Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: A markdown file with instructions on running the SMS-MetalML model workflow Detailed_Description: Entity_Type: Entity_Type_Label: SMS-MetalML/1_KMeans Entity_Type_Definition: Folder containing the KMeans++ clustering script and input/output files used to group samples by geochemical characteristics. Attributes describe individual files within this folder. Entity_Type_Definition_Source: U.S. Geological Survey Attribute: Attribute_Label: SMS-MetalML/1_KMeans/run_kmeans.py Attribute_Definition: Python script to run KMeans++ clustering. Instructions are included for switching target metals (for example, Co, Zn, Au). Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Python script to run KMeans++ clustering Attribute: Attribute_Label: SMS-MetalML/1_KMeans/Readme.md Attribute_Definition: Markdown file for SMS-MetalML Part I K-Means++ Cluster Analysis Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: A markdown file with instructions on running the K-Means++ Cluster Analysis step of the SMS-MetalML workflow Attribute: Attribute_Label: SMS-MetalML/1_KMeans/KMeans_data_dictionary.csv Attribute_Definition: Data dictionary describing input and output comma-delimited table headers unique to the 1_KMeans stage Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: A metadata csv describing the model-generated unique attributes in comma-delimited tables from the code workflow for KMeans++ clustering. Detailed_Description: Entity_Type: Entity_Type_Label: SMS-MetalML/1_KMeans/input_files Entity_Type_Definition: Folder containing input files used to group samples by geochemical characteristics. Attributes describe individual files within this folder. Entity_Type_Definition_Source: U.S. Geological Survey Attribute: Attribute_Label: SMS-MetalML/1_KMeans/input_files/Input_dataset_250227.csv Attribute_Definition: Working dataset used to train and apply the SMS-MetalML model for metal prediction Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Comma-delimited table containing input data for K-Means++ Cluster Analysis. Detailed attribute descriptions for all attributes are available in SMS-MetalML_Data_dictionary.csv. Detailed_Description: Entity_Type: Entity_Type_Label: SMS-MetalML/1_KMeans/output_files Entity_Type_Definition: Folder containing output files from the K-Means++ clustering. Attributes describe individual files within this folder. Entity_Type_Definition_Source: U.S. Geological Survey Attribute: Attribute_Label: SMS-MetalML/1_KMeans/output_files/cluster_centroids_Au_(ppb)_log10.csv Attribute_Definition: Output centroid value for each cluster across scaled features. Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Comma-delimited table containing centroid values across scaled features for each cluster for the target metal gold (Au). Each row corresponds to a cluster, and each column corresponds to a scaled feature used in the clustering (such as, geochemical variables, depth, and one-hot encoded categorical features such as tectonic setting and spreading rate). Values are numeric and represent the feature’s centroid value for that cluster. See README.md in the 1_KMeans folder for the list and description of features. Attribute: Attribute_Label: SMS-MetalML/1_KMeans/output_files/cluster_centroids_Co_(ppm)_log10.csv Attribute_Definition: Output centroid value for each cluster across scaled features. Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Comma-delimited table containing centroid values across scaled features for each cluster for the target metal cobalt (Co). Each row corresponds to a cluster, and each column corresponds to a scaled feature used in the clustering (such as, geochemical variables, depth, and one-hot encoded categorical features such as tectonic setting and spreading rate). Values are numeric and represent the feature’s centroid value for that cluster. See README.md in the 1_KMeans folder for the list and description of features. Attribute: Attribute_Label: SMS-MetalML/1_KMeans/output_files/cluster_centroids_Zn_(wt%)_log10.csv Attribute_Definition: Output centroid value for each cluster across scaled features. Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Comma-delimited table containing centroid values across scaled features for each cluster for the target metal zinc (Zn). Each row corresponds to a cluster, and each column corresponds to a scaled feature used in the clustering (such as, geochemical variables, depth, and one-hot encoded categorical features such as tectonic setting and spreading rate). Values are numeric and represent the feature’s centroid value for that cluster. See README.md in the 1_KMeans folder for the list and description of features. Attribute: Attribute_Label: SMS-MetalML/1_KMeans/output_files/df_with_clusters_Au_(ppb)_log10.csv Attribute_Definition: Output cluster labels per K-Means++ sample analysis from input dataset. Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Comma-delimited table containing the input data with an additional Cluster column assigned by KMeans for the target metal gold (Au). Detailed attribute descriptions for the first 80 columns of this table are from the original source dataset (Input_dataset_250227.csv), and attributes are defined in SMS-MetalML_Data_dictionary.csv. Model-generated attributes unique to this file are documented in KMeans_data_dictionary.csv. Attribute: Attribute_Label: SMS-MetalML/1_KMeans/output_files/df_with_clusters_Co_(ppm)_log10.csv Attribute_Definition: Output cluster labels per K-Means++ sample analysis from input dataset. Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Comma-delimited table containing the input data with an additional Cluster column assigned by KMeans for the target metal cobalt (Co). Detailed attribute descriptions for the first 80 columns of this table are from the original source dataset (Input_dataset_250227.csv), and attributes are defined in SMS-MetalML_Data_dictionary.csv. Model-generated attributes unique to this file are documented in KMeans_data_dictionary.csv. Attribute: Attribute_Label: SMS-MetalML/1_KMeans/output_files/df_with_clusters_Zn_(wt%)_log10.csv Attribute_Definition: Output cluster labels per K-Means++ sample analysis from input dataset. Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Comma-delimited table containing the input data with an additional Cluster column assigned by KMeans for the target metal zinc (Zn). Detailed attribute descriptions for the first 80 columns of this table are from the original source dataset (Input_dataset_250227.csv), and attributes are defined in SMS-MetalML_Data_dictionary.csv. Model-generated attributes unique to this file are documented in KMeans_data_dictionary.csv. Detailed_Description: Entity_Type: Entity_Type_Label: SMS-MetalML/2_Random Forest Entity_Type_Definition: Folder containing Random Forest (RF) classifier scripts, input/output files, and pickled models for cobalt, gold, and zinc cluster prediction. Attributes describe individual files within this folder. Entity_Type_Definition_Source: U.S. Geological Survey Attribute: Attribute_Label: SMS-MetalML/2_Random Forest/RF_model.py Attribute_Definition: Python script for training and testing the Random Forest model. Instructions included for switching target metals. Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Python script to train and test the Random Forest model Attribute: Attribute_Label: SMS-MetalML/2_Random Forest/RF_predict_new_data.py Attribute_Definition: Python script for deploying the Random Forest model to classify new data. Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Python script to train and test the Random Forest model Attribute: Attribute_Label: SMS-MetalML/2_Random Forest/Readme.md Attribute_Definition: Markdown file for SMS-MetalML Part II: Random Forest Classification Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: A markdown file with instructions on running the Part II: Random Forest classification step of the SMS-MetalML workflow Attribute: Attribute_Label: SMS-MetalML/2_Random Forest/RandomForest_data_dictionary.csv Attribute_Definition: Data dictionary describing input and output comma-delimited table headers unique to the 2_Random Forest stage Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: A metadata csv describing the model-generated unique attributes in comma-delimited tables from the code workflow for Random Forest classification. Attribute: Attribute_Label: SMS-MetalML/2_Random Forest/RF_[element]_model.pkl Attribute_Definition: Pickled Random Forest models for cobalt, zinc, and gold cluster prediction. Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Pickled Random Forest models, where [element] is either Co, Zn, or Au. Attribute: Attribute_Label: SMS-MetalML/2_Random Forest/one_hot_encoder.pkl Attribute_Definition: Pickled encoder for feature consistency checks before model application. Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Pickled encoder fitted to training data for feature consistency Detailed_Description: Entity_Type: Entity_Type_Label: SMS-MetalML/2_Random Forest/input_files Entity_Type_Definition: Folder containing Random Forest classifier input files. Attributes describe individual files within this folder. Entity_Type_Definition_Source: U.S. Geological Survey Attribute: Attribute_Label: SMS-MetalML/2_Random Forest/input_files/df_with_merged_clusters.csv Attribute_Definition: Input dataset for RF_model.py containing the original KMeans input dataset (Input_dataset_250227.csv), cluster labels from the KMeans output, and metadata used for training. Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Comma-delimited table containing input data for Random Forest classification. Detailed attribute descriptions for the first 80 columns of this table are from the original KMeans source dataset (Input_dataset_250227.csv), and attributes are defined in SMS-MetalML_Data_dictionary.csv. Model-generated attributes unique to this file are documented in RandomForest_data_dictionary.csv. Attribute: Attribute_Label: SMS-MetalML/2_Random Forest/input_files/Unmeasured_dataset_InterRidge_SMS.csv Attribute_Definition: Input dataset for RF_predict_new_data.py, contains geophysical data from SMS deposits from the InterRidge Vents Database v3.4. Dataset contains spatial and tectonic data but no direct geochemical measurements. Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Comma-delimited table containing input data for Random Forest classification. Detailed attribute descriptions for this file are available at: https://doi.pangaea.de/10.1594/PANGAEA.917894. Detailed_Description: Entity_Type: Entity_Type_Label: SMS-MetalML/2_Random Forest/output_files Entity_Type_Definition: Folder containing Random Forest classifier output files. Attributes describe individual files within this folder. Entity_Type_Definition_Source: U.S. Geological Survey Attribute: Attribute_Label: SMS-MetalML/2_Random Forest/output_files/classification_results_[element].csv Attribute_Definition: Output file containing predicted cluster labels and vote fractions for either gold (Au), cobalt (Co), or zinc (Zn). Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Comma-delimited table of test set results with predicted clusters and vote fractions. Detailed attribute descriptions for the first 83 columns of this table are from the Random Forest source dataset (df_with_merged_clusters.csv). Model-generated attributes unique to this file are documented in RandomForest_data_dictionary.csv. Attribute: Attribute_Label: SMS-MetalML/2_Random Forest/output_files/InterRidge_SMS_RF-classified_[element].csv Attribute_Definition: Output files from `RF_predict_new_data.py` for unmeasured input samples with assigned cluster labels and vote fractions. [element] is either gold (Au), cobalt (Co), or zinc (Zn). Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Comma-delimited table containing output from deployment script with predicted cluster labels for new data. The first 29 columns of this table are from the Random Forest source dataset (Unmeasured_dataset_InterRidge_SMS.csv). Detailed descriptions for these attributes are available at https://doi.pangaea.de/10.1594/PANGAEA.917894. Model-generated attributes unique to this file are documented in RandomForest_data_dictionary.csv. Detailed_Description: Entity_Type: Entity_Type_Label: SMS-MetalML/3_XGBoost Entity_Type_Definition: Folder containing XGBoost regression scripts, input/output files, and trained models to predict metal concentrations. Entity_Type_Definition_Source: U.S. Geological Survey Attribute: Attribute_Label: SMS-MetalML/3_XGBoost/run_xgb_model.py Attribute_Definition: Python script to train and apply XGBoost regression for metal concentration prediction. Instructions included for changing metal targets. Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Python script to train and apply XGBoost regression Attribute: Attribute_Label: SMS-MetalML/3_XGBoost/deploy_xgb_model.py Attribute_Definition: Python script to deploy the XGBoost model to predict element concentrations on new data. Attribute_Definition_Source: This model archive Attribute_Domain_Values: Unrepresentable_Domain: Python script to deploy the XGBoost model Attribute: Attribute_Label: SMS-MetalML/3_XGBoost/Readme.md Attribute_Definition: Markdown file for SMS-MetalML Part III: XGBoost Regression Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: A markdown file with instructions on running the Part III: XGBoost Regression step of the SMS-MetalML workflow. Attribute: Attribute_Label: SMS-MetalML/3_XGBoost/XGBoost_data_dictionary.csv Attribute_Definition: Data dictionary for all 3_XGBoost input and output data. Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: A metadata csv describing the unique attributes in all input and output files from the code workflow for XGBoost regression. Attribute: Attribute_Label: SMS-MetalML/3_XGBoost/xgb_model_[element].pkl Attribute_Definition: Pickled XGBoost regression models for cobalt, zinc, and gold concentration prediction. Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Pickled XGBoost regression model, where [element] is either Co, Zn, or Au. Attribute: Attribute_Label: SMS-MetalML/3_XGBoost/selected_features_[element].pkl Attribute_Definition: Pickled encoder for feature consistency checks before model application. Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Pickled encoder for feature consistency checks, where [element] is either Co, Zn, or Au. Detailed_Description: Entity_Type: Entity_Type_Label: SMS-MetalML/3_XGBoost/input_files Entity_Type_Definition: Folder containing XGBoost regression model input files. Attributes describe individual files within this folder. Entity_Type_Definition_Source: U.S. Geological Survey Attribute: Attribute_Label: SMS-MetalML/3_XGBoost/input_files/dataset_with_clusters_iqr_250416.csv Attribute_Definition: Input data, including cluster labels from KMeans, for the XGBoost regression model. Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Input dataset filtered by interquartile range (IQR) and containing required features and cluster assignments. Columns 1, 5 to 83 of this comma-delimited table are described in SMS-MetalML_Data_Dictionary.csv. Columns 2 to 4 are the KMeans cluster class labels assigned for the Random Forest model input file df_with_merged_clusters.csv. Attribute: Attribute_Label: SMS-MetalML/3_XGBoost/input_files/InterRidge_SMS_RF-classified_Co_Zn_Au.csv Attribute_Definition: Input data for `deploy_xgb_model.py`. Dataset from InterRidge Vents Database v3.4--includes only vents that produce SMS deposits--. Includes the cluster class labeling assigned from the Random Forest model. Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Input dataset containing new SMS samples for concentration prediction. The first 29 columns of this table are from the Random Forest source dataset (Unmeasured_dataset_InterRidge_SMS.csv). Detailed descriptions for these attributes are available at https://doi.pangaea.de/10.1594/PANGAEA.917894. Detailed_Description: Entity_Type: Entity_Type_Label: SMS-MetalML/3_XGBoost/output_files Entity_Type_Definition: Folder containing XGBoost regression model output files. Attributes describe individual files within this folder. Entity_Type_Definition_Source: U.S. Geological Survey Attribute: Attribute_Label: SMS-MetalML/3_XGBoost/output_files/dataset_with_predictions_Au.csv Attribute_Definition: Output dataset with predicted gold concentrations from the `run_xgb_model.py` script. Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Comma-delimited table containing predicted log-transformed metal concentrations and residuals for the test set. The first 83 columns of the table are from the input dataset for RF_model.py (dataset_with_clusters_iqr_250416.csv) with the Cluster_Au column omitted. Detailed attribute descriptions for all original source-data attributes are defined in SMS-MetalML_Data_dictionary.csv. Only model-generated attributes unique to this file are documented in XGBoost_data_dictionary.csv. Attribute: Attribute_Label: SMS-MetalML/3_XGBoost/output_files/dataset_with_predictions_Co.csv Attribute_Definition: Output dataset with predicted cobalt concentrations from the `run_xgb_model.py` script. Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Comma-delimited table containing predicted log-transformed metal concentrations and residuals for the test set. The first 83 columns of the table are from the input dataset for RF_model.py (dataset_with_clusters_iqr_250416.csv) with the Cluster_Co column omitted. Detailed attribute descriptions for all original source-data attributes are defined in SMS-MetalML_Data_dictionary.csv. Only model-generated attributes unique to this file are documented in XGBoost_data_dictionary.csv. Attribute: Attribute_Label: SMS-MetalML/3_XGBoost/output_files/dataset_with_predictions_Zn.csv Attribute_Definition: Output dataset with predicted zinc concentrations from the `run_xgb_model.py` script. Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Comma-delimited table containing predicted log-transformed metal concentrations and residuals for the test set. The first 83 columns of the table are from the input dataset for RF_model.py (dataset_with_clusters_iqr_250416.csv) with the Cluster_Zn column omitted. Detailed attribute descriptions for all original source-data attributes are defined in SMS-MetalML_Data_dictionary.csv. Only model-generated attributes unique to this file are documented in XGBoost_data_dictionary.csv. Attribute: Attribute_Label: SMS-MetalML/3_XGBoost/output_files/predictions_Co_InterRidge_SMS_RF-classified_Co_Zn_Au.csv Attribute_Definition: Output file from `deploy_xgb_model.py` containing predictions for new samples for cobalt. Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Comma-delimited table containing predicted concentrations for new SMS vent field samples. The first 32 columns are from the 3_XGBoost input file InterRidge_SMS_RF-classified_Co_Zn_Au.csv. Only model-generated attributes unique to this file are documented in XGBoost_data_dictionary.csv. Attribute: Attribute_Label: SMS-MetalML/3_XGBoost/output_files/xgb_performance_Co.csv Attribute_Definition: Output file from `run_xgb_model.py` containing RMSE and R squared values for cobalt. Attribute_Definition_Source: Producer defined Attribute_Domain_Values: Unrepresentable_Domain: Summary table of RMSE (Root Mean Square Error) and R squared values. Overview_Description: Entity_and_Attribute_Overview: Includes input and output files used in the SMS-MetalML model workflow. Supporting files include python scripts, pickled models, readme files, and data necessary for model deployment and reproducibility. Entity_and_Attribute_Detail_Citation: U.S. Geological Survey Distribution_Information: Distributor: Contact_Information: Contact_Organization_Primary: Contact_Organization: U.S. Geological Survey - ScienceBase Contact_Address: Address_Type: mailing and physical Address: 2885 Mission Street City: Santa Cruz State_or_Province: CA Postal_Code: 95060 Contact_Voice_Telephone: 831-427-4747 Contact_Electronic_Mail_Address: pcmsc_data@usgs.gov Resource_Description: The models and supplementary documentation are contained in a single zip file (SMS-MetalML.zip) which also includes CSDGM FGDC compliant metadata. Distribution_Liability: Unless otherwise stated, all data, metadata and related materials are considered to satisfy the quality standards relative to the purpose for which the data were collected. Although these data and associated metadata have been reviewed for accuracy and completeness and approved for release by the U.S. Geological Survey (USGS), no warranty expressed or implied is made regarding the display or utility of the data on any other system or for general or scientific purposes, nor shall the act of distribution constitute any such warranty. Standard_Order_Process: Digital_Form: Digital_Transfer_Information: Format_Name: Model data files including comma-delimited text, Python scripts, and Python pickle files Format_Version_Number: Python 3.11.4, UTF-8, 2025, scikit-learn 1.2.2, XGBoost 1.7.5 Format_Specification: Compressed archive containing all model scripts, input/output files, and documentation. Includes all scripts used to train, validate, and apply the machine learning model (KMeans++, Random Forest, and XGBoost), CSV files containing model-derived predictions of Co, Au, and Zn concentrations for vent fields, and Binary files containing trained machine learning models serialized with Python pickle format. File_Decompression_Technique: Use standard ZIP decompression tools. Digital_Transfer_Option: Online_Option: Computer_Contact_Information: Network_Address: Network_Resource_Name: https://www.sciencebase.gov/catalog/file/get/67f010a4d4be02766d636810 Network_Resource_Name: https://doi.org/10.5066/P13PYBJL Access_Instructions: Data can be downloaded using the Network_Resource_Name links. The first link is a direct link to download the zipped file of data and metadata. The second link points to a landing page with metadata and data. Fees: None. Technical_Prerequisites: Python 3.11.4 is required to run the models. Metadata_Reference_Information: Metadata_Date: 20251216 Metadata_Contact: Contact_Information: Contact_Organization_Primary: Contact_Organization: U.S. Geological Survey, Pacific Coastal and Marine Science Center Contact_Person: PCMSC Science Data Coordinator Contact_Address: Address_Type: mailing and physical Address: 2885 Mission Street City: Santa Cruz State_or_Province: CA Postal_Code: 95060 Contact_Voice_Telephone: 831-427-4747 Contact_Electronic_Mail_Address: pcmsc_data@usgs.gov Metadata_Standard_Name: Content Standard for Digital Geospatial Metadata Metadata_Standard_Version: FGDC-STD-001-1998