1 code implementation • 30 Sep 2022 • Ziqiang Zhang, Sanyuan Chen, Long Zhou, Yu Wu, Shuo Ren, Shujie Liu, Zhuoyuan Yao, Xun Gong, LiRong Dai, Jinyu Li, Furu Wei
In this paper, we propose a cross-modal Speech and Language Model (SpeechLM) to explicitly align speech and text pre-training with a pre-defined unified discrete representation.
3 code implementations • 29 Mar 2022 • BinBin Zhang, Di wu, Zhendong Peng, Xingchen Song, Zhuoyuan Yao, Hang Lv, Lei Xie, Chao Yang, Fuping Pan, Jianwei Niu
Recently, we made available WeNet, a production-oriented end-to-end speech recognition toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address the streaming and non-streaming decoding modes in a single model.
4 code implementations • 2 Feb 2021 • Zhuoyuan Yao, Di wu, Xiong Wang, BinBin Zhang, Fan Yu, Chao Yang, Zhendong Peng, Xiaoyu Chen, Lei Xie, Xin Lei
In this paper, we propose an open source, production first, and production ready speech recognition toolkit called WeNet in which a new two-pass approach is implemented to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model.
5 code implementations • 10 Dec 2020 • BinBin Zhang, Di wu, Zhuoyuan Yao, Xiong Wang, Fan Yu, Chao Yang, Liyong Guo, Yaguang Hu, Lei Xie, Xin Lei
In this paper, we present a novel two-pass approach to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model.
Ranked #7 on Speech Recognition on AISHELL-1
no code implementations • 17 Nov 2020 • Xiong Wang, Zhuoyuan Yao, Xian Shi, Lei Xie
End-to-end models are favored in automatic speech recognition (ASR) because of its simplified system structure and superior performance.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 13 Nov 2020 • Fan Yu, Zhuoyuan Yao, Xiong Wang, Keyu An, Lei Xie, Zhijian Ou, Bo Liu, Xiulin Li, Guanqiong Miao
Automatic speech recognition (ASR) has been significantly advanced with the use of deep learning and big data.
Sound Audio and Speech Processing
1 code implementation • 4 Nov 2020 • Yihui Fu, Zhuoyuan Yao, Weipeng He, Jian Wu, Xiong Wang, Zhanheng Yang, Shimin Zhang, Lei Xie, DongYan Huang, Hui Bu, Petr Motlicek, Jean-Marc Odobez
In this challenge, we open source a sizable speech, keyword, echo and noise corpus for promoting data-driven methods, particularly deep-learning approaches on KWS and SSL.
Sound Audio and Speech Processing