47 papers with code • 6 benchmarks • 3 datasets
Translate audio signals of speech in one language into text in a foreign language, either in an end-to-end or cascade manner.
LibrariesUse these libraries to find Speech-to-Text Translation models and implementations
We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation.
Speech translation has recently become an increasingly popular topic of research, partly due to the development of benchmark datasets.
By projecting audio and text features to a common semantic representation, Chimera unifies MT and ST tasks and boosts the performance on ST benchmarks, MuST-C and Augmented Librispeech, to a new state-of-the-art.
Speech translation datasets provide manual segmentations of the audios, which are not available in real-world scenarios, and existing segmentation methods usually significantly reduce translation quality at inference time.
Recently, speech representation learning has improved many speech-related tasks such as speech recognition, speech classification, and speech-to-text translation.
What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages?
This paper proposes a first attempt to build an end-to-end speech-to-text translation system, which does not use source language transcription during learning or decoding.
Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation Evaluation
However, while large quantities of parallel texts (such as Europarl, OpenSubtitles) are available for training machine translation systems, there are no large (100h) and open source parallel corpora that include speech in a source language aligned to text in a target language.