TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Noisy Speech Recognition	CHiME clean	CNN + Bi-RNN + CTC (speech to letters)	Percentage error	6.3	# 2
Noisy Speech Recognition	CHiME real	CNN + Bi-RNN + CTC (speech to letters)	Percentage error	67.94	# 5
Speech Recognition	swb_hub_500 WER fullSWBCH	CNN + Bi-RNN + CTC (speech to letters), 25.9% WER if trainedonlyon SWB	Percentage error	16	# 8
Speech Recognition	Switchboard + Hub500	Deep Speech	Percentage error	20	# 30
Speech Recognition	Switchboard + Hub500	CNN + Bi-RNN + CTC (speech to letters), 25.9% WER if trainedonlyon SWB	Percentage error	12.6	# 18
Speech Recognition	Switchboard + Hub500	Deep Speech + FSH	Percentage error	12.6	# 18
Accented Speech Recognition	VoxForge American-Canadian	Deep Speech	Percentage error	15.01	# 2
Accented Speech Recognition	VoxForge Commonwealth	Deep Speech	Percentage error	28.46	# 2
Accented Speech Recognition	VoxForge European	Deep Speech	Percentage error	31.20	# 2
Accented Speech Recognition	VoxForge Indian	Deep Speech	Percentage error	45.35	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-speech-scaling-up-end-to-end-speech/noisy-speech-recognition-on-chime-clean)](https://paperswithcode.com/sota/noisy-speech-recognition-on-chime-clean?p=deep-speech-scaling-up-end-to-end-speech)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-speech-scaling-up-end-to-end-speech/accented-speech-recognition-on-voxforge-3)](https://paperswithcode.com/sota/accented-speech-recognition-on-voxforge-3?p=deep-speech-scaling-up-end-to-end-speech)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-speech-scaling-up-end-to-end-speech/accented-speech-recognition-on-voxforge-1)](https://paperswithcode.com/sota/accented-speech-recognition-on-voxforge-1?p=deep-speech-scaling-up-end-to-end-speech)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-speech-scaling-up-end-to-end-speech/accented-speech-recognition-on-voxforge-2)](https://paperswithcode.com/sota/accented-speech-recognition-on-voxforge-2?p=deep-speech-scaling-up-end-to-end-speech)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-speech-scaling-up-end-to-end-speech/accented-speech-recognition-on-voxforge)](https://paperswithcode.com/sota/accented-speech-recognition-on-voxforge?p=deep-speech-scaling-up-end-to-end-speech)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-speech-scaling-up-end-to-end-speech/noisy-speech-recognition-on-chime-real)](https://paperswithcode.com/sota/noisy-speech-recognition-on-chime-real?p=deep-speech-scaling-up-end-to-end-speech)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-speech-scaling-up-end-to-end-speech/speech-recognition-on-swb_hub_500-wer)](https://paperswithcode.com/sota/speech-recognition-on-swb_hub_500-wer?p=deep-speech-scaling-up-end-to-end-speech)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-speech-scaling-up-end-to-end-speech/speech-recognition-on-switchboard-hub500)](https://paperswithcode.com/sota/speech-recognition-on-switchboard-hub500?p=deep-speech-scaling-up-end-to-end-speech)`

Deep Speech: Scaling up end-to-end speech recognition

17 Dec 2014 · Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, Andrew Y. Ng ·

We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learns a function that is robust to such effects. We do not need a phoneme dictionary, nor even the concept of a "phoneme." Key to our approach is a well-optimized RNN training system that uses multiple GPUs, as well as a set of novel data synthesis techniques that allow us to efficiently obtain a large amount of varied data for training. Our system, called Deep Speech, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set. Deep Speech also handles challenging noisy environments better than widely used, state-of-the-art commercial speech systems.

PDF Abstract

Code

Add Remove Mark official

PaddlePaddle/PaddleSpeech official

10,142

mozilla/DeepSpeech

24,285

mozilla/STT

24,284

Picovoice/stt-benchmark

586

Picovoice/speech-to-text-benchmark

586

See all 24 implementations

Tasks

Add Remove

Accented Speech Recognition

Speech Recognition

Datasets

VoxForge

Results from the Paper

Edit

Ranked #2 on Accented Speech Recognition on VoxForge American-Canadian

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Noisy Speech Recognition	CHiME clean	CNN + Bi-RNN + CTC (speech to letters)	Percentage error	6.3	# 2	Compare
Noisy Speech Recognition	CHiME real	CNN + Bi-RNN + CTC (speech to letters)	Percentage error	67.94	# 5	Compare
Speech Recognition	swb_hub_500 WER fullSWBCH	CNN + Bi-RNN + CTC (speech to letters), 25.9% WER if trainedonlyon SWB	Percentage error	16	# 8	Compare
Speech Recognition	Switchboard + Hub500	Deep Speech	Percentage error	20	# 30	Compare
Speech Recognition	Switchboard + Hub500	CNN + Bi-RNN + CTC (speech to letters), 25.9% WER if trainedonlyon SWB	Percentage error	12.6	# 18	Compare
Speech Recognition	Switchboard + Hub500	Deep Speech + FSH	Percentage error	12.6	# 18	Compare
Accented Speech Recognition	VoxForge American-Canadian	Deep Speech	Percentage error	15.01	# 2	Compare
Accented Speech Recognition	VoxForge Commonwealth	Deep Speech	Percentage error	28.46	# 2	Compare
Accented Speech Recognition	VoxForge European	Deep Speech	Percentage error	31.20	# 2	Compare
Accented Speech Recognition	VoxForge Indian	Deep Speech	Percentage error	45.35	# 2	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Deep Speech: Scaling up end-to-end speech recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove