TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech Recognition	AISHELL-1	U2	Word Error Rate (WER)	4.72	# 6
Speech Recognition	AISHELL-1	U2	Params(M)	47	# 5
Speech Recognition	AISHELL-1	CTC/Att	Word Error Rate (WER)	4.72	# 6

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unified-streaming-and-non-streaming-two-pass/speech-recognition-on-aishell-1)](https://paperswithcode.com/sota/speech-recognition-on-aishell-1?p=unified-streaming-and-non-streaming-two-pass)`

Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition

10 Dec 2020 · BinBin Zhang, Di wu, Zhuoyuan Yao, Xiong Wang, Fan Yu, Chao Yang, Liyong Guo, Yaguang Hu, Lei Xie, Xin Lei ·

In this paper, we present a novel two-pass approach to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model. Our model adopts the hybrid CTC/attention architecture, in which the conformer layers in the encoder are modified. We propose a dynamic chunk-based attention strategy to allow arbitrary right context length. At inference time, the CTC decoder generates n-best hypotheses in a streaming way. The inference latency could be easily controlled by only changing the chunk size. The CTC hypotheses are then rescored by the attention decoder to get the final result. This efficient rescoring process causes very little sentence-level latency. Our experiments on the open 170-hour AISHELL-1 dataset show that, the proposed method can unify the streaming and non-streaming model simply and efficiently. On the AISHELL-1 test set, our unified model achieves 5.60% relative character error rate (CER) reduction in non-streaming ASR compared to a standard non-streaming transformer. The same model achieves 5.42% CER with 640ms latency in a streaming ASR system.

PDF Abstract

Code

Add Remove Mark official

PaddlePaddle/PaddleSpeech

10,126

TeaPoly/Conformer-Athena

xianchao-wu/wenet-deep-sparse-confo…

joseewei/wenet

Vill-Lab/2023-TMM-Grad-SAS

Tasks

Add Remove

Sentence

speech-recognition

Speech Recognition

Datasets

AISHELL-1

Results from the Paper

Edit

Ranked #6 on Speech Recognition on AISHELL-1

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Speech Recognition	AISHELL-1	U2	Word Error Rate (WER)	4.72	# 6	Compare
Speech Recognition	AISHELL-1	U2	Params(M)	47	# 5	Compare
Speech Recognition	AISHELL-1	CTC/Att	Word Error Rate (WER)	4.72	# 6	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove