TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech Recognition	swb_hub_500 WER fullSWBCH	DNN + Dropout	Percentage error	19.1	# 11
Speech Recognition	Switchboard + Hub500	DNN	Percentage error	16	# 27
Speech Recognition	Switchboard + Hub500	DNN + Dropout	Percentage error	15	# 26

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/building-dnn-acoustic-models-for-large/speech-recognition-on-swb_hub_500-wer)](https://paperswithcode.com/sota/speech-recognition-on-swb_hub_500-wer?p=building-dnn-acoustic-models-for-large)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/building-dnn-acoustic-models-for-large/speech-recognition-on-switchboard-hub500)](https://paperswithcode.com/sota/speech-recognition-on-switchboard-hub500?p=building-dnn-acoustic-models-for-large)`

Building DNN Acoustic Models for Large Vocabulary Speech Recognition

30 Jun 2014 · Andrew L. Maas, Peng Qi, Ziang Xie, Awni Y. Hannun, Christopher T. Lengerich, Daniel Jurafsky, Andrew Y. Ng ·

Deep neural networks (DNNs) are now a central component of nearly all state-of-the-art speech recognition systems. Building neural network acoustic models requires several design decisions including network architecture, size, and training loss function. This paper offers an empirical investigation on which aspects of DNN acoustic model design are most important for speech recognition system performance. We report DNN classifier performance and final speech recognizer word error rates, and compare DNNs using several metrics to quantify factors influencing differences in task performance. Our first set of experiments use the standard Switchboard benchmark corpus, which contains approximately 300 hours of conversational telephone speech. We compare standard DNNs to convolutional networks, and present the first experiments using locally-connected, untied neural networks for acoustic modeling. We additionally build systems on a corpus of 2,100 hours of training data by combining the Switchboard and Fisher corpora. This larger corpus allows us to more thoroughly examine performance of large DNN models -- with up to ten times more parameters than those typically used in speech recognition systems. Our results suggest that a relatively simple DNN architecture and optimization technique produces strong results. These findings, along with previous work, help establish a set of best practices for building DNN hybrid speech recognition systems with maximum likelihood training. Our experiments in DNN optimization additionally serve as a case study for training DNNs with discriminative loss functions for speech tasks, as well as DNN classifiers more generally.

PDF Abstract

Code

Add Remove Mark official

pannous/caffe-speech-recognition

323

Tasks

Add Remove

speech-recognition

Speech Recognition

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Edit

Ranked #11 on Speech Recognition on swb_hub_500 WER fullSWBCH

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Speech Recognition	swb_hub_500 WER fullSWBCH	DNN + Dropout	Percentage error	19.1	# 11	Compare
Speech Recognition	Switchboard + Hub500	DNN	Percentage error	16	# 27	Compare
Speech Recognition	Switchboard + Hub500	DNN + Dropout	Percentage error	15	# 26	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Building DNN Acoustic Models for Large Vocabulary Speech Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove