TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech Recognition	Common Voice French	VoxPopuli-50K (n-gram)	Test WER	9.6%	# 3
Speech Recognition	Common Voice German	VoxPopuli (n-gram)	Test WER	7.8%	# 12
Speech Recognition	Common Voice Spanish	VoxPopuli-50K (n-gram)	Test WER	10.0%	# 6

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/voxpopuli-a-large-scale-multilingual-speech/speech-recognition-on-common-voice-french)](https://paperswithcode.com/sota/speech-recognition-on-common-voice-french?p=voxpopuli-a-large-scale-multilingual-speech)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/voxpopuli-a-large-scale-multilingual-speech/speech-recognition-on-common-voice-spanish)](https://paperswithcode.com/sota/speech-recognition-on-common-voice-spanish?p=voxpopuli-a-large-scale-multilingual-speech)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/voxpopuli-a-large-scale-multilingual-speech/speech-recognition-on-common-voice-german)](https://paperswithcode.com/sota/speech-recognition-on-common-voice-german?p=voxpopuli-a-large-scale-multilingual-speech)`

VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation

ACL 2021 · Changhan Wang, Morgane Rivière, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux ·

We introduce VoxPopuli, a large-scale multilingual corpus providing 100K hours of unlabelled speech data in 23 languages. It is the largest open data to date for unsupervised representation learning as well as semi-supervised learning. VoxPopuli also contains 1.8K hours of transcribed speeches in 16 languages and their aligned oral interpretations into 5 other languages totaling 5.1K hours. We provide speech recognition baselines and validate the versatility of VoxPopuli unlabelled data in semi-supervised learning under challenging out-of-domain settings. We will release the corpus at https://github.com/facebookresearch/voxpopuli under an open license.

PDF Abstract ACL 2021 PDF ACL 2021 Abstract

Code

Add Remove Mark official

facebookresearch/voxpopuli official

493

Tasks

Add Remove

Representation Learning

speech-recognition

Speech Recognition

Datasets

Introduced in the Paper:

VoxPopuli

Used in the Paper:

LibriSpeech

Common Voice Europarl-ST

Results from the Paper

Edit

Ranked #3 on Speech Recognition on Common Voice French (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Speech Recognition	Common Voice French	VoxPopuli-50K (n-gram)	Test WER	9.6%	# 3	Compare
Speech Recognition	Common Voice German	VoxPopuli (n-gram)	Test WER	7.8%	# 12	Compare
Speech Recognition	Common Voice Spanish	VoxPopuli-50K (n-gram)	Test WER	10.0%	# 6	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove