Speech-to-Text Translation

29 papers with code • 6 benchmarks • 3 datasets

Translate audio signals of speech in one language into text in a foreign language, either in an end-to-end or cascade manner.

Most implemented papers

fairseq S2T: Fast Speech-to-Text Modeling with fairseq

pytorch/fairseq Asian Chapter of the Association for Computational Linguistics 2020

We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation.

Learning Shared Semantic Space for Speech-to-Text Translation

Glaciohound/Chimera-SLT Findings (ACL) 2021

By projecting audio and text features to a common semantic representation, Chimera unifies MT and ST tasks and boosts the performance on ST benchmarks, MuST-C and Augmented Librispeech, to a new state-of-the-art.

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

mt-upc/shas 9 Feb 2022

Speech translation datasets provide manual segmentations of the audios, which are not available in real-world scenarios, and existing segmentation methods usually significantly reduce translation quality at inference time.

Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation

eske/seq2seq 6 Dec 2016

This paper proposes a first attempt to build an end-to-end speech-to-text translation system, which does not use source language transcription during learning or decoding.

Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation Evaluation

alicank/Translation-Augmented-LibriSpeech-Corpus LREC 2018

However, while large quantities of parallel texts (such as Europarl, OpenSubtitles) are available for training machine translation systems, there are no large (100h) and open source parallel corpora that include speech in a source language aligned to text in a target language.

End-to-End Automatic Speech Translation of Audiobooks

alicank/Translation-Augmented-LibriSpeech-Corpus 12 Feb 2018

We investigate end-to-end speech-to-text translation on a corpus of audiobooks specifically augmented for this task.

007: Democratically Finding The Cause of Packet Drops

behnazak/Vigil-007SourceCode 20 Feb 2018

Network failures continue to plague datacenter operators as their symptoms may not have direct correlation with where or why they occur.

Pre-training on high-resource speech recognition improves low-resource speech-to-text translation

0xSameer/ast NAACL 2019

Finally, we show that the approach improves performance on a true low-resource task: pre-training on a combination of English ASR and French ASR improves Mboshi-French ST, where only 4 hours of data are available, from 3. 5 to 7. 1

Direct speech-to-speech translation with a sequence-to-sequence model

sam2125/translatotron 12 Apr 2019

We present an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation.

CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus

facebookresearch/covost LREC 2020

Spoken language translation has recently witnessed a resurgence in popularity, thanks to the development of end-to-end models and the creation of new corpora, such as Augmented LibriSpeech and MuST-C.