Corpus Creation and Evaluation for Speech-to-Text and Speech Translation

The National Virtual Translation Center (NVTC) seeks to acquire human language technology (HLT) tools that will facilitate its mission to provide verbatim English translations of foreign language audio and video files. In the text domain, NVTC has been using translation memory (TM) for some time and has reported on the incorporation of machine translation (MT) into that workflow (Miller et al., 2020). While we have explored the use of speech-totext (STT) and speech translation (ST) in the past (Tzoukermann and Miller, 2018), we have now invested in the creation of a substantial human-made corpus to thoroughly evaluate alternatives. Results from our analysis of this corpus and the performance of HLT tools point the way to the most promising ones to deploy in our workflow.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here