Speech-to-Text Translation

50 papers with code • 10 benchmarks • 3 datasets

Translate audio signals of speech in one language into text in a foreign language, either in an end-to-end or cascade manner.

Benchmarks

Add a Result

These leaderboards are used to track progress in Speech-to-Text Translation

Dataset	Best Model	Compare
MuST-C EN->DE	Task Modulation + Multitask Learning(ASR/MT) + Data Augmentation	See all
MuST-C EN->ES	Transformer with Adapters	See all
MuST-C EN->FR	Dual-decoder Transformer	See all
libri-trans	Transformer + ASR Pretrain + SpecAug	See all
MuST-C	Transformer with Adapters	See all
FLEURS X-eng	SeamlessM4T Large	See all
FLEURS eng-X	SeamlessM4T Large	See all
CoVoST 2 X-eng	SeamlessM4T Large	See all
CoVoST 2 eng-X	SeamlessM4T Large	See all
MuST-C EN->NL	Speechformer	See all

Libraries

Use these libraries to find Speech-to-Text Translation models and implementations

formiel/fairseq

2 papers

Datasets

Subtasks

Simultaneous Speech-to-Text Translation

Most implemented papers

Most implemented Social Latest No code

End-to-End Automatic Speech Translation of Audiobooks

alicank/Translation-Augmented-LibriSpeech-Corpus • 12 Feb 2018

We investigate end-to-end speech-to-text translation on a corpus of audiobooks specifically augmented for this task.

Paper
Code

Pre-training on high-resource speech recognition improves low-resource speech-to-text translation

0xSameer/ast • NAACL 2019

Finally, we show that the approach improves performance on a true low-resource task: pre-training on a combination of English ASR and French ASR improves Mboshi-French ST, where only 4 hours of data are available, from 3. 5 to 7. 1

Paper
Code

Direct speech-to-speech translation with a sequence-to-sequence model

sam2125/translatotron • • 12 Apr 2019

We present an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation.

Paper
Code

Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding

kabongosalomon/listra • 16 Dec 2019

Speech-to-text translation (ST), which translates source language speech into target language text, has attracted intensive attention in recent years.

Paper
Code

FlexiBO: A Decoupled Cost-Aware Multi-Objective Optimization Approach for Deep Neural Networks

softsys4ai/FlexiBO • • 18 Jan 2020

FlexiBO weights the improvement of the hypervolume of the Pareto region by the measurement cost of each objective to balance the expense of collecting new information with the knowledge gained through objective evaluations, preventing us from performing expensive measurements for little to no gain.

Paper
Code

CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus

facebookresearch/covost • • LREC 2020

Spoken language translation has recently witnessed a resurgence in popularity, thanks to the development of end-to-end models and the creation of new corpora, such as Augmented LibriSpeech and MuST-C.

Paper
Code

Contextualized Translation of Automatically Segmented Speech

mgaido91/FBK-fairseq-ST • • 5 Aug 2020

We show that our context-aware solution is more robust to VAD-segmented input, outperforming a strong base model and the fine-tuning on different VAD segmentations of an English-German test set by up to 4. 25 BLEU points.

Paper
Code

Consecutive Decoding for Speech-to-text Translation

dqqcasia/st • • 21 Sep 2020

The key idea is to generate source transcript and target translation text with a single decoder.

Paper
Code

"Listen, Understand and Translate": Triple Supervision Decouples End-to-end Speech-to-text Translation

dqqcasia/st • • 21 Sep 2020

Can we build a system to fully utilize signals in a parallel ST corpus?

Paper
Code

Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation

formiel/speech-translation • • COLING 2020

We propose two variants of these architectures corresponding to two different levels of dependencies between the decoders, called the parallel and cross dual-decoder Transformers, respectively.

Paper
Code

Speech-to-Text Translation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result