Speech-to-Speech Translation

17 papers with code • 1 benchmarks • 5 datasets

Speech-to-speech translation (S2ST) consists on translating speech from one language to speech in another language. This can be done with a cascade of automatic speech recognition (ASR), text-to-text machine translation (MT), and text-to-speech (TTS) synthesis sub-systems, which is text-centric. Recently, works on S2ST without relying on intermediate text representation is emerging.


Use these libraries to find Speech-to-Speech Translation models and implementations
2 papers

Most implemented papers

Direct speech-to-speech translation with a sequence-to-sequence model

sam2125/translatotron 12 Apr 2019

We present an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation.

Towards Automatic Face-to-Face Translation

Rudrabha/LipGAN ACM Multimedia, 2019 2019

As today's digital communication becomes increasingly visual, we argue that there is a need for systems that can automatically translate a video of a person speaking in language A into a target language B with realistic lip synchronization.

ESPnet-ST: All-in-One Speech Translation Toolkit

espnet/espnet ACL 2020

We present ESPnet-ST, which is designed for the quick development of speech-to-speech translation systems in a single framework.

Multimodal and Multilingual Embeddings for Large-Scale Speech Mining

facebookresearch/LASER NeurIPS 2021

Using a similarity metric in that multimodal embedding space, we perform mining of audio in German, French, Spanish and English from Librivox against billions of sentences from Common Crawl.

CVSS Corpus and Massively Multilingual Speech-to-Speech Translation

google-research-datasets/cvss LREC 2022

In addition, CVSS provides normalized translation text which matches the pronunciation in the translation speech.

LibriS2S: A German-English Speech-to-Speech Translation Corpus

pedrodke/libris2s LREC 2022

In contrast, the activities in the area of speech-to-speech translation is still limited, although it is essential to overcome the language barrier.

Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation

fengpeng-yue/speech-to-speech-translation 18 May 2022

Direct Speech-to-speech translation (S2ST) has drawn more and more attention recently.

Speech-to-speech translation for a real-world unwritten language

facebookresearch/fairseq arXiv 2022

We use English-Taiwanese Hokkien as a case study, and present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.

SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations

facebookresearch/fairseq arXiv 2022

We present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech translations mined from real speech of European Parliament recordings.

Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation

microsoft/speecht5 31 Oct 2022

However, direct S2ST suffers from the data scarcity problem because the corpora from speech of the source language to speech of the target language are very rare.