TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Speech Recognition	LibriSpeech test-clean	Multistream CNN with Self-Attentive SRU (WER includes text normalization)	Word Error Rate (WER)	1.75	# 7
Speech Recognition	LibriSpeech test-other	Multistream CNN with Self-Attentive SRU	Word Error Rate (WER)	4.46	# 21

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/asapp-asr-multistream-cnn-and-self-attentive/speech-recognition-on-librispeech-test-clean)](https://paperswithcode.com/sota/speech-recognition-on-librispeech-test-clean?p=asapp-asr-multistream-cnn-and-self-attentive)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/asapp-asr-multistream-cnn-and-self-attentive/speech-recognition-on-librispeech-test-other)](https://paperswithcode.com/sota/speech-recognition-on-librispeech-test-other?p=asapp-asr-multistream-cnn-and-self-attentive)`

ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition

21 May 2020 · Jing Pan, Joshua Shapiro, Jeremy Wohlwend, Kyu J. Han, Tao Lei, Tao Ma ·

In this paper we present state-of-the-art (SOTA) performance on the LibriSpeech corpus with two novel neural network architectures, a multistream CNN for acoustic modeling and a self-attentive simple recurrent unit (SRU) for language modeling. In the hybrid ASR framework, the multistream CNN acoustic model processes an input of speech frames in multiple parallel pipelines where each stream has a unique dilation rate for diversity. Trained with the SpecAugment data augmentation method, it achieves relative word error rate (WER) improvements of 4% on test-clean and 14% on test-other. We further improve the performance via N-best rescoring using a 24-layer self-attentive SRU language model, achieving WERs of 1.75% on test-clean and 4.46% on test-other.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Data Augmentation

Language Modelling

speech-recognition

Speech Recognition

Datasets

LibriSpeech

Results from the Paper

Edit

Ranked #7 on Speech Recognition on LibriSpeech test-clean

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Speech Recognition	LibriSpeech test-clean	Multistream CNN with Self-Attentive SRU (WER includes text normalization)	Word Error Rate (WER)	1.75	# 7		Compare
Speech Recognition	LibriSpeech test-other	Multistream CNN with Self-Attentive SRU	Word Error Rate (WER)	4.46	# 21		Compare

Methods

Add Remove

Highway Layer • Sigmoid Activation • SRU

Edit Social Preview

ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove