Metadata: Identification_Information: Citation: Citation_Information: Originator: Evan T. Dailey Publication_Date: 2017 Title: ViTexOCR; a script to extract text overlays from digital video Geospatial_Data_Presentation_Form: Python script, PDF documentation Series_Information: Series_Name: software release Issue_Identification: DOI:10.5066/F7833Q56 Publication_Information: Publication_Place: Pacific Coastal and Marine Science Center, Santa Cruz, California Publisher: U.S. Geological Survey Online_Linkage: https://doi.org/10.5066/F7833Q56 Online_Linkage: https://www.sciencebase.gov/catalog/file/get/58dd56ace4b02ff32c685954 Description: Abstract: The ViTexOCR script presents a new method for extracting navigation data from videos with text overlays using optical character recognition (OCR) software. Over the past few decades, it was common for videos recorded during surveys to be overlaid with real-time geographic positioning satellite chyrons including latitude, longitude, date and time, as well as other ancillary data (such as speed, heading, or user input identifying fields). Embedding these data into videos provides them with utility and accuracy, but using the location data for other purposes, such as analysis in a geographic information system, is not possible when only available on the video display. Extracting the text data from imagery using software allows these videos to be located and analyzed in a geospatial context. The script allows a user to select a video, specify the text data types (e.g. latitude, longitude, date, time, or other), text color, and the pixel locations of overlay text data on a sample video frame. The script’s output is a data file containing the retrieved geospatial and temporal data. All functionality is bundled in a Python script that incorporates a graphical user interface and several other software dependencies. Purpose: The ViTexOCR script was developed to geospatially locate videos, primarily for the purpose of including videos collected through the USGS Coastal and Marine Geology Program in the USGS Video and Photograph Portal. Time_Period_of_Content: Time_Period_Information: Single_Date/Time: Calendar_Date: 2017 Currentness_Reference: publication date Status: Progress: Complete Maintenance_and_Update_Frequency: As needed Spatial_Domain: Bounding_Coordinates: West_Bounding_Coordinate: -180.0 East_Bounding_Coordinate: 180.0 North_Bounding_Coordinate: 90.0 South_Bounding_Coordinate: -90.0 Keywords: Theme: Theme_Keyword_Thesaurus: USGS Metadata Identifier Theme_Keyword: USGS:58dd56ace4b02ff32c685954 Theme: Theme_Keyword_Thesaurus: Marine Realms Information Bank (MRIB) keywords Theme_Keyword: computer science Theme: Theme_Keyword_Thesaurus: USGS Thesaurus Theme_Keyword: scientific software Theme_Keyword: software development Theme: Theme_Keyword_Thesaurus: None Theme_Keyword: U.S. Geological Survey Theme_Keyword: USGS Theme_Keyword: Coastal and Marine Geology Program Theme_Keyword: CMGP Theme_Keyword: Pacific Coastal and Marine Science Center Theme_Keyword: PCMSC Access_Constraints: none Use_Constraints: USGS-authored or produced data and information are in the public domain from the U.S. Government and are freely redistributable with proper metadata and source attribution. Please recognize and acknowledge the U.S. Geological Survey as the originator(s) of this script. Point_of_Contact: Contact_Information: Contact_Person_Primary: Contact_Person: Evan T. Dailey Contact_Organization: U.S. Geological Survey, Pacific Coastal and Marine Science Center Contact_Address: Address_Type: mailing and physical Address: 2885 Mission Street City: Santa Cruz State_or_Province: CA Postal_Code: 95060 Country: United States Contact_Voice_Telephone: 831-460-7591 Contact_Electronic_Mail_Address: edailey@usgs.gov Browse_Graphic: Browse_Graphic_File_Name: https://www.sciencebase.gov/catalog/file/get/58dd56ace4b02ff32c685954?name=DisplayImage.png&allowOpen=true Browse_Graphic_File_Description: Top: example of image from digital video with portions of text overlays outlined; Bottom: example of extracted results of navigation data from digital video displayed on a map Browse_Graphic_File_Type: PNG Native_Data_Set_Environment: The python script, filename ViTexOCR.py, was developed using Python version 2.7.11 on Mac OS X version 10.10.4 and Windows 7 64-bit. The python script file is 34 kb. Data_Quality_Information: Attribute_Accuracy: Attribute_Accuracy_Report: No formal attribute accuracy tests were conducted, nor are they applicable for this data. Logical_Consistency_Report: No formal logical accuracy tests were conducted, nor are they applicable for this data. Completeness_Report: Data set is considered complete for the information presented, as described in the abstract. Users are advised to read the rest of the metadata record and accompanying documentation carefully for additional details. Positional_Accuracy: Horizontal_Positional_Accuracy: Horizontal_Positional_Accuracy_Report: No formal positional accuracy tests were conducted, nor are they applicable for this data. Vertical_Positional_Accuracy: Vertical_Positional_Accuracy_Report: No formal positional accuracy tests were conducted, nor are they applicable for this data. Lineage: Process_Step: Process_Description: Python script was developed to incorporate optical character recognition software to geospatially locate videos collected by the USGS Coastal and Marine Geology Program. Process_Date: 2017 Process_Step: Process_Description: Edited metadata to add keywords section with USGS persistent identifier as theme keyword. No data were changed. Process_Date: 20201019 Process_Contact: Contact_Information: Contact_Organization_Primary: Contact_Organization: U.S. Geological Survey Contact_Person: VeeAnn A. Cross Contact_Position: Marine Geologist Contact_Address: Address_Type: Mailing and Physical Address: 384 Woods Hole Road City: Woods Hole State_or_Province: MA Postal_Code: 02543-1598 Contact_Voice_Telephone: 508-548-8700 x2251 Contact_Facsimile_Telephone: 508-457-2310 Contact_Electronic_Mail_Address: vatnipp@usgs.gov Distribution_Information: Distributor: Contact_Information: Contact_Organization_Primary: Contact_Organization: U.S. Geological Survey - ScienceBase Contact_Address: Address_Type: Mailing and Physical Address: Denver Federal Center, Building 810, Mail Stop 302 City: Denver State_or_Province: CO Postal_Code: 80225 Contact_Voice_Telephone: 1-888-275-8747 Contact_Electronic_Mail_Address: sciencebase@usgs.gov Resource_Description: The ViTexOCR script is available in Python format. The script and associated files, including a PDF file documenting the use and installation of the script and CSGDM FGDC-compliant metadata, is contained in a single zip file. Distribution_Liability: This script has been approved for release by the U.S. Geological Survey (USGS). Although the script has been subjected to rigorous review, the USGS reserves the right to update the script as needed pursuant to further analysis and review. No warranty, expressed or implied, is made by the USGS or the U.S. Government as to the functionality of the script and related material nor shall the fact of release constitute any such warranty. Furthermore, the script is released on condition that neither the USGS nor the U.S. Government shall be held liable for any damages resulting from its authorized or unauthorized use. Standard_Order_Process: Digital_Form: Digital_Transfer_Information: Format_Name: PY Format_Version_Number: Python 2.7.11 Format_Specification: Python script Format_Information_Content: Zip file contains the Python script, two system files containing language data, a PDF file documenting the script, two sample videos in MPEG-4 format, an optional training script written in Python, and CSGDM FGDC-compliant metadata. File_Decompression_Technique: 7-Zip, WinZip, or Archive Utility Transfer_Size: 9.4 Digital_Transfer_Option: Online_Option: Computer_Contact_Information: Network_Address: Network_Resource_Name: https://www.sciencebase.gov/catalog/file/get/58dd56ace4b02ff32c685954 Network_Resource_Name: https://doi.org/10.5066/F7833Q56 Access_Instructions: Data can be downloaded via the Internet Fees: None. Technical_Prerequisites: The script was written to run with Python version 2.7 and requires FFmepg, ImageMagick, Tesseract OCR and several Python dependencies. See the ViTexOCR script documentation (ViTexOCR_Documentation.pdf) for full explanation of software requirements. Metadata_Reference_Information: Metadata_Date: 20201019 Metadata_Contact: Contact_Information: Contact_Person_Primary: Contact_Person: Evan T. Dailey Contact_Organization: U.S. Geological Survey, Pacific Coastal and Marine Science Center Contact_Position: Contractor Contact_Address: Address_Type: mailing and physical Address: 2885 Mission Street City: Santa Cruz State_or_Province: CA Postal_Code: 95060 Country: United States Contact_Voice_Telephone: 831-460-7591 Contact_Electronic_Mail_Address: edailey@usgs.gov Metadata_Standard_Name: Content Standard for Digital Geospatial Metadata Metadata_Standard_Version: FGDC-STD-001-1998