TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech Recognition	AISHELL-1	Paraformer-large	Word Error Rate (WER)	1.95	# 3
Speech Recognition	AISHELL-1	Paraformer-large	Params(M)	220	# 7
Speech Recognition	AISHELL-1	Paraformer	Word Error Rate (WER)	4.95	# 8
Speech Recognition	AISHELL-1	Paraformer	Params(M)	46.3	# 4
Speech Recognition	AISHELL-2	Paraformer	Word Error Rate (WER)	5.73	# 2
Speech Recognition	AISHELL-2	Paraformer-large	Word Error Rate (WER)	2.85	# 1
Speech Recognition	WenetSpeech	Paraformer-large	Character Error Rate (CER)	6.97	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/funasr-a-fundamental-end-to-end-speech/speech-recognition-on-aishell-2)](https://paperswithcode.com/sota/speech-recognition-on-aishell-2?p=funasr-a-fundamental-end-to-end-speech)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/funasr-a-fundamental-end-to-end-speech/speech-recognition-on-wenetspeech)](https://paperswithcode.com/sota/speech-recognition-on-wenetspeech?p=funasr-a-fundamental-end-to-end-speech)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/funasr-a-fundamental-end-to-end-speech/speech-recognition-on-aishell-1)](https://paperswithcode.com/sota/speech-recognition-on-aishell-1?p=funasr-a-fundamental-end-to-end-speech)`

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

18 May 2023 · Zhifu Gao, Zerui Li, JiaMing Wang, Haoneng Luo, Xian Shi, Mengzhe Chen, Yabin Li, Lingyun Zuo, Zhihao Du, Zhangyu Xiao, Shiliang Zhang ·

This paper introduces FunASR, an open-source speech recognition toolkit designed to bridge the gap between academic research and industrial applications. FunASR offers models trained on large-scale industrial corpora and the ability to deploy them in applications. The toolkit's flagship model, Paraformer, is a non-autoregressive end-to-end speech recognition model that has been trained on a manually annotated Mandarin speech recognition dataset that contains 60,000 hours of speech. To improve the performance of Paraformer, we have added timestamp prediction and hotword customization capabilities to the standard Paraformer backbone. In addition, to facilitate model deployment, we have open-sourced a voice activity detection model based on the Feedforward Sequential Memory Network (FSMN-VAD) and a text post-processing punctuation model based on the controllable time-delay Transformer (CT-Transformer), both of which were trained on industrial corpora. These functional modules provide a solid foundation for building high-precision long audio speech recognition services. Compared to other models trained on open datasets, Paraformer demonstrates superior performance.

PDF Abstract

Code

Add Remove Mark official

alibaba-damo-academy/FunASR official

3,588

Tasks

Add Remove

Action Detection

Activity Detection

speech-recognition

Speech Recognition

Datasets

AISHELL-1

AISHELL-2

WenetSpeech

Results from the Paper

Edit

Ranked #1 on Speech Recognition on WenetSpeech (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Speech Recognition	AISHELL-1	Paraformer-large	Word Error Rate (WER)	1.95	# 3	Compare
Speech Recognition	AISHELL-1	Paraformer-large	Params(M)	220	# 7	Compare
Speech Recognition	AISHELL-1	Paraformer	Word Error Rate (WER)	4.95	# 8	Compare
Speech Recognition	AISHELL-1	Paraformer	Params(M)	46.3	# 4	Compare
Speech Recognition	AISHELL-2	Paraformer	Word Error Rate (WER)	5.73	# 2	Compare
Speech Recognition	AISHELL-2	Paraformer-large	Word Error Rate (WER)	2.85	# 1	Compare
Speech Recognition	WenetSpeech	Paraformer-large	Character Error Rate (CER)	6.97	# 1	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Memory Network • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove