TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech Recognition	GigaSpeech	Conformer/Transformer-AED	Word Error Rate (WER)	10.90	# 1
Speech Recognition	GigaSpeech DEV	Conformer/Transformer-AED	Word Error Rate (WER)	10.90	# 1
Speech Recognition	GigaSpeech TEST	Conformer/Transformer-AED	Word Error Rate (WER)	10.80	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/gigaspeech-an-evolving-multi-domain-asr/speech-recognition-on-gigaspeech)](https://paperswithcode.com/sota/speech-recognition-on-gigaspeech?p=gigaspeech-an-evolving-multi-domain-asr)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/gigaspeech-an-evolving-multi-domain-asr/speech-recognition-on-gigaspeech-dev)](https://paperswithcode.com/sota/speech-recognition-on-gigaspeech-dev?p=gigaspeech-an-evolving-multi-domain-asr)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/gigaspeech-an-evolving-multi-domain-asr/speech-recognition-on-gigaspeech-test)](https://paperswithcode.com/sota/speech-recognition-on-gigaspeech-test?p=gigaspeech-an-evolving-multi-domain-asr)`

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

13 Jun 2021 · Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, Zhiyong Yan ·

This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and 40,000 hours of total audio suitable for semi-supervised and unsupervised training. Around 40,000 hours of transcribed audio is first collected from audiobooks, podcasts and YouTube, covering both read and spontaneous speaking styles, and a variety of topics, such as arts, science, sports, etc. A new forced alignment and segmentation pipeline is proposed to create sentence segments suitable for speech recognition training, and to filter out segments with low-quality transcription. For system training, GigaSpeech provides five subsets of different sizes, 10h, 250h, 1000h, 2500h, and 10000h. For our 10,000-hour XL training subset, we cap the word error rate at 4% during the filtering/validation stage, and for all our other smaller training subsets, we cap it at 0%. The DEV and TEST evaluation sets, on the other hand, are re-processed by professional human transcribers to ensure high transcription quality. Baseline systems are provided for popular speech recognition toolkits, namely Athena, ESPnet, Kaldi and Pika.

PDF Abstract

Code

Add Remove Mark official

SpeechColab/GigaSpeech official

596

speechtranslation/gigas2s

Tasks

Add Remove

Sentence

speech-recognition

Speech Recognition

Datasets

Introduced in the Paper:

GigaSpeech

Results from the Paper

Edit

Ranked #1 on Speech Recognition on GigaSpeech

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Speech Recognition	GigaSpeech	Conformer/Transformer-AED	Word Error Rate (WER)	10.90	# 1	Compare
Speech Recognition	GigaSpeech DEV	Conformer/Transformer-AED	Word Error Rate (WER)	10.90	# 1	Compare
Speech Recognition	GigaSpeech TEST	Conformer/Transformer-AED	Word Error Rate (WER)	10.80	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove