TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Speech Recognition	swb_hub_500 WER fullSWBCH	IBM (LSTM+Conformer encoder-decoder)	Percentage error	6.8	# 1
Speech Recognition	Switchboard + Hub500	IBM (LSTM+Conformer encoder-decoder)	Percentage error	4.3	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/on-the-limit-of-english-conversational-speech/speech-recognition-on-swb_hub_500-wer)](https://paperswithcode.com/sota/speech-recognition-on-swb_hub_500-wer?p=on-the-limit-of-english-conversational-speech)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/on-the-limit-of-english-conversational-speech/speech-recognition-on-switchboard-hub500)](https://paperswithcode.com/sota/speech-recognition-on-switchboard-hub500?p=on-the-limit-of-english-conversational-speech)`

On the limit of English conversational speech recognition

3 May 2021 · Zoltán Tüske, George Saon, Brian Kingsbury ·

In our previous work we demonstrated that a single headed attention encoder-decoder model is able to reach state-of-the-art results in conversational speech recognition. In this paper, we further improve the results for both Switchboard 300 and 2000. Through use of an improved optimizer, speaker vector embeddings, and alternative speech representations we reduce the recognition errors of our LSTM system on Switchboard-300 by 4% relative. Compensation of the decoder model with the probability ratio approach allows more efficient integration of an external language model, and we report 5.9% and 11.5% WER on the SWB and CHM parts of Hub5'00 with very simple LSTM models. Our study also considers the recently proposed conformer, and more advanced self-attention based language models. Overall, the conformer shows similar performance to the LSTM; nevertheless, their combination and decoding with an improved LM reaches a new record on Switchboard-300, 5.0% and 10.0% WER on SWB and CHM. Our findings are also confirmed on Switchboard-2000, and a new state of the art is reported, practically reaching the limit of the benchmark.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

English Conversational Speech Recognition

Language Modelling

speech-recognition

Speech Recognition

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Edit

Ranked #1 on Speech Recognition on Switchboard + Hub500

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Speech Recognition	swb_hub_500 WER fullSWBCH	IBM (LSTM+Conformer encoder-decoder)	Percentage error	6.8	# 1		Compare
Speech Recognition	Switchboard + Hub500	IBM (LSTM+Conformer encoder-decoder)	Percentage error	4.3	# 1		Compare

Methods

Add Remove

LSTM • Sigmoid Activation • Tanh Activation

Edit Social Preview

On the limit of English conversational speech recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove