Speech-to-Text Translation
50 papers with code • 10 benchmarks • 3 datasets
Translate audio signals of speech in one language into text in a foreign language, either in an end-to-end or cascade manner.
Libraries
Use these libraries to find Speech-to-Text Translation models and implementationsMost implemented papers
End-to-End Automatic Speech Translation of Audiobooks
We investigate end-to-end speech-to-text translation on a corpus of audiobooks specifically augmented for this task.
Pre-training on high-resource speech recognition improves low-resource speech-to-text translation
Finally, we show that the approach improves performance on a true low-resource task: pre-training on a combination of English ASR and French ASR improves Mboshi-French ST, where only 4 hours of data are available, from 3. 5 to 7. 1
Direct speech-to-speech translation with a sequence-to-sequence model
We present an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation.
Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding
Speech-to-text translation (ST), which translates source language speech into target language text, has attracted intensive attention in recent years.
FlexiBO: A Decoupled Cost-Aware Multi-Objective Optimization Approach for Deep Neural Networks
FlexiBO weights the improvement of the hypervolume of the Pareto region by the measurement cost of each objective to balance the expense of collecting new information with the knowledge gained through objective evaluations, preventing us from performing expensive measurements for little to no gain.
CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus
Spoken language translation has recently witnessed a resurgence in popularity, thanks to the development of end-to-end models and the creation of new corpora, such as Augmented LibriSpeech and MuST-C.
Contextualized Translation of Automatically Segmented Speech
We show that our context-aware solution is more robust to VAD-segmented input, outperforming a strong base model and the fine-tuning on different VAD segmentations of an English-German test set by up to 4. 25 BLEU points.
Consecutive Decoding for Speech-to-text Translation
The key idea is to generate source transcript and target translation text with a single decoder.
"Listen, Understand and Translate": Triple Supervision Decouples End-to-end Speech-to-text Translation
Can we build a system to fully utilize signals in a parallel ST corpus?
Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation
We propose two variants of these architectures corresponding to two different levels of dependencies between the decoders, called the parallel and cross dual-decoder Transformers, respectively.