TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech-to-Text Translation	CoVoST 2 eng-X	SeamlessM4T Large	BLEU	30.6	# 1
Speech-to-Text Translation	CoVoST 2 eng-X	SeamlessM4T Medium	BLEU	26.6	# 2
Speech-to-Text Translation	CoVoST 2 X-eng	SeamlessM4T Large	BLEU	34.1	# 1
Speech-to-Text Translation	CoVoST 2 X-eng	SeamlessM4T Medium	BLEU	29.8	# 2
Speech-to-Speech Translation	CVSS	SeamlessM4T Large	ASR-BLEU	36.5	# 1
Speech-to-Speech Translation	CVSS	SeamlessM4T Large	Parameters	2.3B	# 1
Speech-to-Speech Translation	CVSS	SeamlessM4T Medium	ASR-BLEU	28.1	# 2
Speech-to-Speech Translation	CVSS	SeamlessM4T Medium	Parameters	1.2B	# 1
Automatic Speech Recognition	FLEURS	SeamlessM4T Large	Parameters	2.3B	# 1
Automatic Speech Recognition	FLEURS	SeamlessM4T Large	Word Error Rate (WER)	23.1	# 2
Automatic Speech Recognition	FLEURS	SeamlessM4T Medium	Parameters	1.2B	# 1
Automatic Speech Recognition	FLEURS	SeamlessM4T Medium	Word Error Rate (WER)	21.9	# 1
Automatic Speech Recognition	FLEURS-54	SeamlessM4T Large	Word Error Rate (WER)	23.7	# 2
Automatic Speech Recognition	FLEURS-54	SeamlessM4T Medium	Word Error Rate (WER)	22	# 1
Speech-to-Text Translation	FLEURS eng-X	SeamlessM4T Large	BLEU	21.5	# 1
Speech-to-Text Translation	FLEURS eng-X	SeamlessM4T Medium	BLEU	19.2	# 2
Speech-to-Text Translation	FLEURS X-eng	SeamlessM4T Medium	BLEU	20.9	# 2
Speech-to-Speech Translation	FLEURS X-eng	SeamlessM4T Large	ASR-BLEU	25.8	# 1
Speech-to-Speech Translation	FLEURS X-eng	SeamlessM4T Medium	ASR-BLEU	20.4	# 2
Speech-to-Text Translation	FLEURS X-eng	SeamlessM4T Large	BLEU	24.0	# 1
Machine Translation	flores95-devtest eng-X	SeamlessM4T Large	ChrF++	50.9	# 1
Machine Translation	flores95-devtest eng-X	SeamlessM4T-NLLB-1.3B	ChrF++	49.6	# 2
Machine Translation	flores95-devtest eng-X	SeamlessM4T Medium	ChrF++	48.4	# 3
Machine Translation	flores95-devtest X-eng	SeamlessM4T-NLLB-1.3B	ChrF++	60.7	# 2
Machine Translation	flores95-devtest X-eng	SeamlessM4T Medium	ChrF++	55.4	# 3
Machine Translation	flores95-devtest X-eng	SeamlessM4T Large	ChrF++	60.8	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/seamlessm4t-massively-multilingual-multimodal/speech-to-text-translation-on-covost-2-eng-x)](https://paperswithcode.com/sota/speech-to-text-translation-on-covost-2-eng-x?p=seamlessm4t-massively-multilingual-multimodal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/seamlessm4t-massively-multilingual-multimodal/speech-to-text-translation-on-covost-2-x-eng)](https://paperswithcode.com/sota/speech-to-text-translation-on-covost-2-x-eng?p=seamlessm4t-massively-multilingual-multimodal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/seamlessm4t-massively-multilingual-multimodal/speech-to-speech-translation-on-cvss)](https://paperswithcode.com/sota/speech-to-speech-translation-on-cvss?p=seamlessm4t-massively-multilingual-multimodal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/seamlessm4t-massively-multilingual-multimodal/automatic-speech-recognition-on-fleurs-1)](https://paperswithcode.com/sota/automatic-speech-recognition-on-fleurs-1?p=seamlessm4t-massively-multilingual-multimodal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/seamlessm4t-massively-multilingual-multimodal/automatic-speech-recognition-on-fleurs-54)](https://paperswithcode.com/sota/automatic-speech-recognition-on-fleurs-54?p=seamlessm4t-massively-multilingual-multimodal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/seamlessm4t-massively-multilingual-multimodal/speech-to-text-translation-on-fleurs-eng-x)](https://paperswithcode.com/sota/speech-to-text-translation-on-fleurs-eng-x?p=seamlessm4t-massively-multilingual-multimodal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/seamlessm4t-massively-multilingual-multimodal/speech-to-speech-translation-on-fleurs-x-eng)](https://paperswithcode.com/sota/speech-to-speech-translation-on-fleurs-x-eng?p=seamlessm4t-massively-multilingual-multimodal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/seamlessm4t-massively-multilingual-multimodal/speech-to-text-translation-on-fleurs-x-eng)](https://paperswithcode.com/sota/speech-to-text-translation-on-fleurs-x-eng?p=seamlessm4t-massively-multilingual-multimodal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/seamlessm4t-massively-multilingual-multimodal/machine-translation-on-flores95-devtest-eng-x)](https://paperswithcode.com/sota/machine-translation-on-flores95-devtest-eng-x?p=seamlessm4t-massively-multilingual-multimodal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/seamlessm4t-massively-multilingual-multimodal/machine-translation-on-flores95-devtest-x-eng)](https://paperswithcode.com/sota/machine-translation-on-flores95-devtest-x-eng?p=seamlessm4t-massively-multilingual-multimodal)`

SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

22 Aug 2023 · Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim, Prangthip Hansanti, Russ Howes, Bernie Huang, Min-Jae Hwang, Hirofumi Inaguma, Somya Jain, Elahe Kalbassi, Amanda Kallet, Ilia Kulikov, Janice Lam, Daniel Li, Xutai Ma, Ruslan Mavlyutov, Benjamin Peloquin, Mohamed Ramadan, Abinesh Ramakrishnan, Anna Sun, Kevin Tran, Tuan Tran, Igor Tufanov, Vish Vogeti, Carleigh Wood, Yilin Yang, Bokai Yu, Pierre Andrews, Can Balioglu, Marta R. Costa-jussà, Onur Celebi, Maha Elbayad, Cynthia Gao, Francisco Guzmán, Justine Kao, Ann Lee, Alexandre Mourachko, Juan Pino, Sravya Popuri, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Paden Tomasello, Changhan Wang, Jeff Wang, Skyler Wang ·

What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded systems that perform translation progressively, putting high-performing unified systems out of reach. To address these gaps, we introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition for up to 100 languages. To build this, we used 1 million hours of open speech audio data to learn self-supervised speech representations with w2v-BERT 2.0. Subsequently, we created a multimodal corpus of automatically aligned speech translations. Filtered and combined with human-labeled and pseudo-labeled data, we developed the first multilingual system capable of translating from and into English for both speech and text. On FLEURS, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous SOTA in direct speech-to-text translation. Compared to strong cascaded models, SeamlessM4T improves the quality of into-English translation by 1.3 BLEU points in speech-to-text and by 2.6 ASR-BLEU points in speech-to-speech. Tested for robustness, our system performs better against background noises and speaker variations in speech-to-text tasks compared to the current SOTA model. Critically, we evaluated SeamlessM4T on gender bias and added toxicity to assess translation safety. Finally, all contributions in this work are open-sourced and accessible at https://github.com/facebookresearch/seamless_communication

PDF Abstract

Code

Add Remove Mark official

facebookresearch/seamless_communica… official

↳ Quickstart in

Spaces

10,202

facebookresearch/sonar

274

Tasks

Add Remove

Automatic Speech Recognition

Machine Translation

Speech-to-Speech Translation

Speech-to-Text Translation

text-to-speech translation

Translation

Datasets

MuST-C MUSAN VoxPopuli FLoRes-101

FLoRes-200 FLEURS VoxLingua107 CoVoST CVSS SONAR

Results from the Paper

Edit

Ranked #1 on Machine Translation on flores95-devtest eng-X

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Speech-to-Text Translation	CoVoST 2 eng-X	SeamlessM4T Large	BLEU	30.6	# 1	Compare
Speech-to-Text Translation	CoVoST 2 eng-X	SeamlessM4T Medium	BLEU	26.6	# 2	Compare
Speech-to-Text Translation	CoVoST 2 X-eng	SeamlessM4T Large	BLEU	34.1	# 1	Compare
Speech-to-Text Translation	CoVoST 2 X-eng	SeamlessM4T Medium	BLEU	29.8	# 2	Compare
Speech-to-Speech Translation	CVSS	SeamlessM4T Large	ASR-BLEU	36.5	# 1	Compare
Speech-to-Speech Translation	CVSS	SeamlessM4T Large	Parameters	2.3B	# 1	Compare
Speech-to-Speech Translation	CVSS	SeamlessM4T Medium	ASR-BLEU	28.1	# 2	Compare
Speech-to-Speech Translation	CVSS	SeamlessM4T Medium	Parameters	1.2B	# 1	Compare
Automatic Speech Recognition	FLEURS	SeamlessM4T Large	Parameters	2.3B	# 1	Compare
Automatic Speech Recognition	FLEURS	SeamlessM4T Large	Word Error Rate (WER)	23.1	# 2	Compare
Automatic Speech Recognition	FLEURS	SeamlessM4T Medium	Parameters	1.2B	# 1	Compare
Automatic Speech Recognition	FLEURS	SeamlessM4T Medium	Word Error Rate (WER)	21.9	# 1	Compare
Automatic Speech Recognition	FLEURS-54	SeamlessM4T Large	Word Error Rate (WER)	23.7	# 2	Compare
Automatic Speech Recognition	FLEURS-54	SeamlessM4T Medium	Word Error Rate (WER)	22	# 1	Compare
Speech-to-Text Translation	FLEURS eng-X	SeamlessM4T Large	BLEU	21.5	# 1	Compare
Speech-to-Text Translation	FLEURS eng-X	SeamlessM4T Medium	BLEU	19.2	# 2	Compare
Speech-to-Text Translation	FLEURS X-eng	SeamlessM4T Medium	BLEU	20.9	# 2	Compare
Speech-to-Speech Translation	FLEURS X-eng	SeamlessM4T Large	ASR-BLEU	25.8	# 1	Compare
Speech-to-Speech Translation	FLEURS X-eng	SeamlessM4T Medium	ASR-BLEU	20.4	# 2	Compare
Speech-to-Text Translation	FLEURS X-eng	SeamlessM4T Large	BLEU	24.0	# 1	Compare
Machine Translation	flores95-devtest eng-X	SeamlessM4T Large	ChrF++	50.9	# 1	Compare
Machine Translation	flores95-devtest eng-X	SeamlessM4T-NLLB-1.3B	ChrF++	49.6	# 2	Compare
Machine Translation	flores95-devtest eng-X	SeamlessM4T Medium	ChrF++	48.4	# 3	Compare
Machine Translation	flores95-devtest X-eng	SeamlessM4T-NLLB-1.3B	ChrF++	60.7	# 2	Compare
Machine Translation	flores95-devtest X-eng	SeamlessM4T Medium	ChrF++	55.4	# 3	Compare
Machine Translation	flores95-devtest X-eng	SeamlessM4T Large	ChrF++	60.8	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove