ViTexOCR; a script to extract text overlays from digital video

Metadata also available as - [Outline] - [Parseable text] - [XML]

Frequently anticipated questions:


What does this data set describe?

Title: ViTexOCR; a script to extract text overlays from digital video
Abstract:
The ViTexOCR script presents a new method for extracting navigation data from videos with text overlays using optical character recognition (OCR) software. Over the past few decades, it was common for videos recorded during surveys to be overlaid with real-time geographic positioning satellite chyrons including latitude, longitude, date and time, as well as other ancillary data (such as speed, heading, or user input identifying fields). Embedding these data into videos provides them with utility and accuracy, but using the location data for other purposes, such as analysis in a geographic information system, is not possible when only available on the video display. Extracting the text data from imagery using software allows these videos to be located and analyzed in a geospatial context.
The script allows a user to select a video, specify the text data types (e.g. latitude, longitude, date, time, or other), text color, and the pixel locations of overlay text data on a sample video frame. The script’s output is a data file containing the retrieved geospatial and temporal data. All functionality is bundled in a Python script that incorporates a graphical user interface and several other software dependencies.
  1. How might this data set be cited?
    Dailey, Evan T., 2017, ViTexOCR; a script to extract text overlays from digital video: software release DOI:10.5066/F7833Q56, U.S. Geological Survey, Pacific Coastal and Marine Science Center, Santa Cruz, California.

    Online Links:

  2. What geographic area does the data set cover?
    West_Bounding_Coordinate: -180.0
    East_Bounding_Coordinate: 180.0
    North_Bounding_Coordinate: 90.0
    South_Bounding_Coordinate: -90.0
  3. What does it look like?
    https://www.sciencebase.gov/catalog/file/get/58dd56ace4b02ff32c685954?name=DisplayImage.png&allowOpen=true (PNG)
    Top: example of image from digital video with portions of text overlays outlined; Bottom: example of extracted results of navigation data from digital video displayed on a map
  4. Does the data set describe conditions during a particular time period?
    Calendar_Date: 2017
    Currentness_Reference:
    publication date
  5. What is the general form of this data set?
    Geospatial_Data_Presentation_Form: Python script, PDF documentation
  6. How does the data set represent geographic features?
    1. How are geographic features stored in the data set?
    2. What coordinate system is used to represent geographic features?
  7. How does the data set describe geographic features?

Who produced the data set?

  1. Who are the originators of the data set? (may include formal authors, digital compilers, and editors)
    • Evan T. Dailey
  2. Who also contributed to the data set?
  3. To whom should users address questions about the data?
    Evan T. Dailey
    U.S. Geological Survey, Pacific Coastal and Marine Science Center
    2885 Mission Street
    Santa Cruz, CA
    United States

    831-460-7591 (voice)
    edailey@usgs.gov

Why was the data set created?

The ViTexOCR script was developed to geospatially locate videos, primarily for the purpose of including videos collected through the USGS Coastal and Marine Geology Program in the USGS Video and Photograph Portal.

How was the data set created?

  1. From what previous works were the data drawn?
  2. How were the data generated, processed, and modified?
    Date: 2017 (process 1 of 2)
    Python script was developed to incorporate optical character recognition software to geospatially locate videos collected by the USGS Coastal and Marine Geology Program.
    Date: 19-Oct-2020 (process 2 of 2)
    Edited metadata to add keywords section with USGS persistent identifier as theme keyword. No data were changed. Person who carried out this activity:
    U.S. Geological Survey
    Attn: VeeAnn A. Cross
    Marine Geologist
    384 Woods Hole Road
    Woods Hole, MA

    508-548-8700 x2251 (voice)
    508-457-2310 (FAX)
    vatnipp@usgs.gov
  3. What similar or related data should the user be aware of?

How reliable are the data; what problems remain in the data set?

  1. How well have the observations been checked?
    No formal attribute accuracy tests were conducted, nor are they applicable for this data.
  2. How accurate are the geographic locations?
    No formal positional accuracy tests were conducted, nor are they applicable for this data.
  3. How accurate are the heights or depths?
    No formal positional accuracy tests were conducted, nor are they applicable for this data.
  4. Where are the gaps in the data? What is missing?
    Data set is considered complete for the information presented, as described in the abstract. Users are advised to read the rest of the metadata record and accompanying documentation carefully for additional details.
  5. How consistent are the relationships among the observations, including topology?
    No formal logical accuracy tests were conducted, nor are they applicable for this data.

How can someone get a copy of the data set?

Are there legal restrictions on access or use of the data?
Access_Constraints: none
Use_Constraints:
USGS-authored or produced data and information are in the public domain from the U.S. Government and are freely redistributable with proper metadata and source attribution. Please recognize and acknowledge the U.S. Geological Survey as the originator(s) of this script.
  1. Who distributes the data set? (Distributor 1 of 1)
    U.S. Geological Survey - ScienceBase
    Denver Federal Center, Building 810, Mail Stop 302
    Denver, CO

    1-888-275-8747 (voice)
    sciencebase@usgs.gov
  2. What's the catalog number I need to order this data set? The ViTexOCR script is available in Python format. The script and associated files, including a PDF file documenting the use and installation of the script and CSGDM FGDC-compliant metadata, is contained in a single zip file.
  3. What legal disclaimers am I supposed to read?
    This script has been approved for release by the U.S. Geological Survey (USGS). Although the script has been subjected to rigorous review, the USGS reserves the right to update the script as needed pursuant to further analysis and review. No warranty, expressed or implied, is made by the USGS or the U.S. Government as to the functionality of the script and related material nor shall the fact of release constitute any such warranty. Furthermore, the script is released on condition that neither the USGS nor the U.S. Government shall be held liable for any damages resulting from its authorized or unauthorized use.
  4. How can I download or order the data?
  5. What hardware or software do I need in order to use the data set?
    The script was written to run with Python version 2.7 and requires FFmepg, ImageMagick, Tesseract OCR and several Python dependencies. See the ViTexOCR script documentation (ViTexOCR_Documentation.pdf) for full explanation of software requirements.

Who wrote the metadata?

Dates:
Last modified: 19-Oct-2020
Metadata author:
Evan T. Dailey
U.S. Geological Survey, Pacific Coastal and Marine Science Center
Contractor
2885 Mission Street
Santa Cruz, CA
United States

831-460-7591 (voice)
edailey@usgs.gov
Metadata standard:
Content Standard for Digital Geospatial Metadata (FGDC-STD-001-1998)

This page is <https://cmgds.marine.usgs.gov/catalog/pcmsc/DataReleases/ScienceBase/DR_F7833Q56/ViTexOCR_metadata.faq.html>
Generated by mp version 2.9.50 on Tue Sep 21 18:17:11 2021