Speech-to-Speech Translation

27 papers with code • 3 benchmarks • 5 datasets

Speech-to-speech translation (S2ST) consists on translating speech from one language to speech in another language. This can be done with a cascade of automatic speech recognition (ASR), text-to-text machine translation (MT), and text-to-speech (TTS) synthesis sub-systems, which is text-centric. Recently, works on S2ST without relying on intermediate text representation is emerging.

Benchmarks

Add a Result

These leaderboards are used to track progress in Speech-to-Speech Translation

Dataset	Best Model	Compare
TAT	Hokkien→En (Two-pass decoding)	See all
FLEURS X-eng	SeamlessM4T Large	See all
CVSS	SeamlessM4T Large	See all

Libraries

Use these libraries to find Speech-to-Speech Translation models and implementations

facebookresearch/fairseq

2 papers

29,224

espnet/espnet

2 papers

7,867

rongjiehuang/transpeech

2 papers

157

Datasets

Most implemented papers

Most implemented Social Latest No code

Speech-to-speech translation for a real-world unwritten language

facebookresearch/fairseq • • arXiv 2022

We use English-Taiwanese Hokkien as a case study, and present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.

Paper
Code

SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations

facebookresearch/fairseq • • arXiv 2022

We present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech translations mined from real speech of European Parliament recordings.

Paper
Code

A Textless Metric for Speech-to-Speech Comparison

besacier/textless-metric • 21 Oct 2022

In this paper, we introduce a new and simple method for comparing speech utterances without relying on text transcripts.

Paper
Code

Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation

microsoft/speecht5 • • 31 Oct 2022

However, direct S2ST suffers from the data scarcity problem because the corpora from speech of the source language to speech of the target language are very rare.

Paper
Code

Dialogs Re-enacted Across Languages

joneavila/dral • 18 Nov 2022

To support machine learning of cross-language prosodic mappings and other ways to improve speech-to-speech translation, we present a protocol for collecting closely matched pairs of utterances across languages, a description of the resulting data collection and its public release, and some observations and musings.

Paper
Code

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units

facebookresearch/fairseq • • 15 Dec 2022

We enhance the model performance by subword prediction in the first-pass decoder, advanced two-pass decoder architecture design and search strategy, and better training regularization.

Paper
Code

BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric

facebookresearch/stopes • • 16 Dec 2022

In this paper, we propose a text-free evaluation metric for end-to-end S2ST, named BLASER, to avoid the dependency on ASR systems.

Paper
Code

Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling

plachtaa/vall-e-x • • 7 Mar 2023

We propose a cross-lingual neural codec language model, VALL-E X, for cross-lingual speech synthesis.

Paper
Code

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit

espnet/espnet • • 10 Apr 2023

ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitated by the broadening interests of the spoken language translation community.

Paper
Code

Textless Low-Resource Speech-to-Speech Translation With Unit Language Models

ajd12342/unit-speech-translation • 24 May 2023

We train and evaluate our models for English-to-German, German-to-English and Marathi-to-English translation on three different domains (European Parliament, Common Voice, and All India Radio) with single-speaker synthesized speech data.

Paper
Code

Speech-to-Speech Translation

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result