TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech Recognition	AMI IMH	ConformerXXL-P + Downstream NST	Word Error Rate (WER)	7.8	# 1
Speech Recognition	AMI SDM1	ConformerXXL-P	Word Error Rate (WER)	17.7	# 1
Speech Recognition	CHiME-6 dev_gss12	ConformerXXL-PS	Word Error Rate (WER)	26.2	# 2
Speech Recognition	CHiME-6 eval	ConformerXXL-PS	Word Error Rate (WER)	31	# 2
Speech Recognition	Common Voice	ConformerXXL-P + Downstream NST	Test WER	7.7%	# 1
Speech Emotion Recognition	CREMA-D	ConformerXL-P	Accuracy	88.2	# 1
Speech Recognition	TED-LIUM	ConformerXXL-PS	Word Error Rate (WER)	5	# 1
Language Identification	VoxForge	ConformerG-P	Accuracy	99.8	# 1
Speech Recognition	WSJ eval92	ConformerXXL-P	Word Error Rate (WER)	1.3	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bigssl-exploring-the-frontier-of-large-scale/speech-recognition-on-ami-imh)](https://paperswithcode.com/sota/speech-recognition-on-ami-imh?p=bigssl-exploring-the-frontier-of-large-scale)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bigssl-exploring-the-frontier-of-large-scale/speech-recognition-on-ami-sdm1)](https://paperswithcode.com/sota/speech-recognition-on-ami-sdm1?p=bigssl-exploring-the-frontier-of-large-scale)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bigssl-exploring-the-frontier-of-large-scale/speech-recognition-on-common-voice-2)](https://paperswithcode.com/sota/speech-recognition-on-common-voice-2?p=bigssl-exploring-the-frontier-of-large-scale)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bigssl-exploring-the-frontier-of-large-scale/speech-emotion-recognition-on-crema-d)](https://paperswithcode.com/sota/speech-emotion-recognition-on-crema-d?p=bigssl-exploring-the-frontier-of-large-scale)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bigssl-exploring-the-frontier-of-large-scale/speech-recognition-on-ted-lium)](https://paperswithcode.com/sota/speech-recognition-on-ted-lium?p=bigssl-exploring-the-frontier-of-large-scale)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bigssl-exploring-the-frontier-of-large-scale/language-identification-on-voxforge)](https://paperswithcode.com/sota/language-identification-on-voxforge?p=bigssl-exploring-the-frontier-of-large-scale)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bigssl-exploring-the-frontier-of-large-scale/speech-recognition-on-wsj-eval92)](https://paperswithcode.com/sota/speech-recognition-on-wsj-eval92?p=bigssl-exploring-the-frontier-of-large-scale)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bigssl-exploring-the-frontier-of-large-scale/speech-recognition-on-chime-6-dev-gss12)](https://paperswithcode.com/sota/speech-recognition-on-chime-6-dev-gss12?p=bigssl-exploring-the-frontier-of-large-scale)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bigssl-exploring-the-frontier-of-large-scale/speech-recognition-on-chime-6-eval)](https://paperswithcode.com/sota/speech-recognition-on-chime-6-eval?p=bigssl-exploring-the-frontier-of-large-scale)`

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

27 Sep 2021 · Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yanping Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang, Yonghui Wu ·

We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled data. In particular, on an ASR task with 34k hours of labeled data, by fine-tuning an 8 billion parameter pre-trained Conformer model we can match state-of-the-art (SoTA) performance with only 3% of the training data and significantly improve SoTA with the full training set. We also report on the universal benefits gained from using big pre-trained and self-trained models for a large set of downstream tasks that cover a wide range of speech domains and span multiple orders of magnitudes of dataset sizes, including obtaining SoTA performance on many public benchmarks. In addition, we utilize the learned representation of pre-trained networks to achieve SoTA results on non-ASR tasks.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Automatic Speech Recognition

Automatic Speech Recognition (ASR)

Language Identification

Speech Emotion Recognition

speech-recognition

Speech Recognition

Datasets

LibriSpeech

VoxCeleb1

AudioSet

Speech Commands

Common Voice

ESC-50 Libri-Light TED-LIUM CREMA-D

VoxForge

Results from the Paper

Edit

Ranked #1 on Speech Recognition on Common Voice

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Speech Recognition	AMI IMH	ConformerXXL-P + Downstream NST	Word Error Rate (WER)	7.8	# 1	Compare
Speech Recognition	AMI SDM1	ConformerXXL-P	Word Error Rate (WER)	17.7	# 1	Compare
Speech Recognition	CHiME-6 dev_gss12	ConformerXXL-PS	Word Error Rate (WER)	26.2	# 2	Compare
Speech Recognition	CHiME-6 eval	ConformerXXL-PS	Word Error Rate (WER)	31	# 2	Compare
Speech Recognition	Common Voice	ConformerXXL-P + Downstream NST	Test WER	7.7%	# 1	Compare
Speech Emotion Recognition	CREMA-D	ConformerXL-P	Accuracy	88.2	# 1	Compare
Speech Recognition	TED-LIUM	ConformerXXL-PS	Word Error Rate (WER)	5	# 1	Compare
Language Identification	VoxForge	ConformerG-P	Accuracy	99.8	# 1	Compare
Speech Recognition	WSJ eval92	ConformerXXL-P	Word Error Rate (WER)	1.3	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove