TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech Recognition	LibriSpeech test-clean	Conformer(M)	Word Error Rate (WER)	2	# 15
Speech Recognition	LibriSpeech test-clean	Conformer(S)	Word Error Rate (WER)	2.1	# 21
Speech Recognition	LibriSpeech test-clean	Conformer(L)	Word Error Rate (WER)	1.9	# 12
Speech Recognition	LibriSpeech test-other	Conformer(S)	Word Error Rate (WER)	5.0	# 24
Speech Recognition	LibriSpeech test-other	Conformer(M)	Word Error Rate (WER)	4.3	# 20
Speech Recognition	LibriSpeech test-other	Conformer(L)	Word Error Rate (WER)	3.9	# 12

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/conformer-convolution-augmented-transformer/speech-recognition-on-librispeech-test-clean)](https://paperswithcode.com/sota/speech-recognition-on-librispeech-test-clean?p=conformer-convolution-augmented-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/conformer-convolution-augmented-transformer/speech-recognition-on-librispeech-test-other)](https://paperswithcode.com/sota/speech-recognition-on-librispeech-test-other?p=conformer-convolution-augmented-transformer)`

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020 · Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang ·

Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs). Transformer models are good at capturing content-based global interactions, while CNNs exploit local features effectively. In this work, we achieve the best of both worlds by studying how to combine convolution neural networks and transformers to model both local and global dependencies of an audio sequence in a parameter-efficient way. To this regard, we propose the convolution-augmented transformer for speech recognition, named Conformer. Conformer significantly outperforms the previous Transformer and CNN based models achieving state-of-the-art accuracies. On the widely used LibriSpeech benchmark, our model achieves WER of 2.1%/4.3% without using a language model and 1.9%/3.9% with an external language model on test/testother. We also observe competitive performance of 2.7%/6.3% with a small model of only 10M parameters.

PDF Abstract

Code

Add Remove Mark official

PaddlePaddle/PaddleSpeech

10,095

wenet-e2e/wenet

↳ Quickstart in

Spaces

3,681

alibaba-damo-academy/FunASR

3,115

lucidrains/soundstorm-pytorch

1,110

TensorSpeech/TensorFlowASR

900

See all 24 implementations

Tasks

Add Remove

Automatic Speech Recognition

Automatic Speech Recognition (ASR)

Language Modelling

Speech Recognition

Datasets

LibriSpeech

Results from the Paper

Edit

Ranked #12 on Speech Recognition on LibriSpeech test-other (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Speech Recognition	LibriSpeech test-clean	Conformer(M)	Word Error Rate (WER)	2	# 15	Compare
Speech Recognition	LibriSpeech test-clean	Conformer(S)	Word Error Rate (WER)	2.1	# 21	Compare
Speech Recognition	LibriSpeech test-clean	Conformer(L)	Word Error Rate (WER)	1.9	# 12	Compare
Speech Recognition	LibriSpeech test-other	Conformer(S)	Word Error Rate (WER)	5.0	# 24	Compare
Speech Recognition	LibriSpeech test-other	Conformer(M)	Word Error Rate (WER)	4.3	# 20	Compare
Speech Recognition	LibriSpeech test-other	Conformer(L)	Word Error Rate (WER)	3.9	# 12	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Convolution • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • ReLU • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Conformer: Convolution-augmented Transformer for Speech Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove