no code implementations • 7 Oct 2023 • Kaixun Huang, Ao Zhang, BinBin Zhang, Tianyi Xu, Xingchen Song, Lei Xie
However, unlike shallow fusion methods that directly bias the posterior of the ASR model, deep biasing methods implicitly integrate contextual information, making it challenging to control the degree of bias.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 18 May 2023 • Xingchen Song, Di wu, BinBin Zhang, Zhendong Peng, Bo Dang, Fuping Pan, Zhiyong Wu
In this paper, we present ZeroPrompt (Figure 1-(a)) and the corresponding Prompt-and-Refine strategy (Figure 3), two simple but effective \textbf{training-free} methods to decrease the Token Display Time (TDT) of streaming ASR models \textbf{without any accuracy loss}.
1 code implementation • 19 Apr 2023 • Yaoxun Xu, Baiji Liu, Qiaochu Huang and, Xingchen Song, Zhiyong Wu, Shiyin Kang, Helen Meng
In this work, we propose CB-Conformer to improve biased word recognition by introducing the Contextual Biasing Module and the Self-Adaptive Language Model to vanilla Conformer.
1 code implementation • 1 Nov 2022 • Xingchen Song, Di wu, Zhiyong Wu, BinBin Zhang, Yuekai Zhang, Zhendong Peng, Wenpeng Li, Fuping Pan, Changbao Zhu
In this paper, we present TrimTail, a simple but effective emission regularization method to improve the latency of streaming ASR models.
no code implementations • 31 Oct 2022 • Xingchen Song, Di wu, BinBin Zhang, Zhiyong Wu, Wenpeng Li, Dongfang Li, Pengshen Zhang, Zhendong Peng, Fuping Pan, Changbao Zhu, Zhongqin Wu
Therefore, we name it FusionFormer.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
3 code implementations • 29 Mar 2022 • BinBin Zhang, Di wu, Zhendong Peng, Xingchen Song, Zhuoyuan Yao, Hang Lv, Lei Xie, Chao Yang, Fuping Pan, Jianwei Niu
Recently, we made available WeNet, a production-oriented end-to-end speech recognition toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address the streaming and non-streaming decoding modes in a single model.
no code implementations • 28 Oct 2020 • Xingchen Song, Zhiyong Wu, Yiheng Huang, Chao Weng, Dan Su, Helen Meng
Non-autoregressive (NAR) transformer models have achieved significantly inference speedup but at the cost of inferior accuracy compared to autoregressive (AR) models in automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 23 Oct 2019 • Xingchen Song, Guangsen Wang, Zhiyong Wu, Yiheng Huang, Dan Su, Dong Yu, Helen Meng
Our best systems achieve a relative improvement of 11. 9% and 8. 3% on the TIMIT and WSJ tasks respectively.