TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech Recognition	swb_hub_500 WER fullSWBCH	VGG/Resnet/LACE/BiLSTM acoustic model trained on SWB+Fisher+CH, N-gram + RNNLM language model trained on Switchboard+Fisher+Gigaword+Broadcast	Percentage error	11.9	# 4
Speech Recognition	Switchboard + Hub500	RNNLM	Percentage error	6.9	# 9
Speech Recognition	Switchboard + Hub500	VGG/Resnet/LACE/BiLSTM acoustic model trained on SWB+Fisher+CH, N-gram + RNNLM language model trained on Switchboard+Fisher+Gigaword+Broadcast	Percentage error	6.3	# 6
Speech Recognition	Switchboard + Hub500	Microsoft 2016	Percentage error	6.2	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-microsoft-2016-conversational-speech/speech-recognition-on-swb_hub_500-wer)](https://paperswithcode.com/sota/speech-recognition-on-swb_hub_500-wer?p=the-microsoft-2016-conversational-speech)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-microsoft-2016-conversational-speech/speech-recognition-on-switchboard-hub500)](https://paperswithcode.com/sota/speech-recognition-on-switchboard-hub500?p=the-microsoft-2016-conversational-speech)`

The Microsoft 2016 Conversational Speech Recognition System

12 Sep 2016 · W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, G. Zweig ·

We describe Microsoft's conversational speech recognition system, in which we combine recent developments in neural-network-based acoustic and language modeling to advance the state of the art on the Switchboard recognition task. Inspired by machine learning ensemble techniques, the system uses a range of convolutional and recurrent neural networks. I-vector modeling and lattice-free MMI training provide significant gains for all acoustic model architectures. Language model rescoring with multiple forward and backward running RNNLMs, and word posterior-based system combination provide a 20% boost. The best single system uses a ResNet architecture acoustic model with RNNLM rescoring, and achieves a word error rate of 6.9% on the NIST 2000 Switchboard task. The combined system has an error rate of 6.2%, representing an improvement over previously reported results on this benchmark task.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Language Modelling

speech-recognition

Speech Recognition

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Edit

Ranked #4 on Speech Recognition on swb_hub_500 WER fullSWBCH

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Speech Recognition	swb_hub_500 WER fullSWBCH	VGG/Resnet/LACE/BiLSTM acoustic model trained on SWB+Fisher+CH, N-gram + RNNLM language model trained on Switchboard+Fisher+Gigaword+Broadcast	Percentage error	11.9	# 4	Compare
Speech Recognition	Switchboard + Hub500	RNNLM	Percentage error	6.9	# 9	Compare
Speech Recognition	Switchboard + Hub500	VGG/Resnet/LACE/BiLSTM acoustic model trained on SWB+Fisher+CH, N-gram + RNNLM language model trained on Switchboard+Fisher+Gigaword+Broadcast	Percentage error	6.3	# 6	Compare
Speech Recognition	Switchboard + Hub500	Microsoft 2016	Percentage error	6.2	# 5	Compare

Methods

Add Remove

1x1 Convolution • Average Pooling • Batch Normalization • Bottleneck Residual Block • Convolution • Global Average Pooling • Kaiming Initialization • Max Pooling • ReLU • Residual Block • Residual Connection • ResNet

Edit Social Preview

The Microsoft 2016 Conversational Speech Recognition System

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove