no code implementations • 26 Sep 2024 • Shifu Xiong, Mengzhi Wang, Genshun Wan, Hang Chen, Jianqing Gao, LiRong Dai
In this work, we propose deep CLAS to use contextual information better.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • 5 Sep 2024 • Genshun Wan, Mengzhi Wang, Tingzhi Mao, Hang Chen, Zhongfu Ye
The transducer model trained based on sequence-level criterion requires a lot of memory due to the generation of the large probability matrix.
Ranked #5 on Speech Recognition on AISHELL-1
no code implementations • 3 Sep 2024 • Shutong Niu, Ruoyu Wang, Jun Du, Gaobin Yang, Yanhui Tu, Siyuan Wu, Shuangqing Qian, Huaxin Wu, Haitao Xu, Xueyang Zhang, Guolong Zhong, Xindi Yu, Jieru Chen, Mengzhi Wang, Di Cai, Tian Gao, Genshun Wan, Feng Ma, Jia Pan, Jianqing Gao
This technical report outlines our submission system for the CHiME-8 NOTSOFAR-1 Challenge.
no code implementations • 28 Aug 2023 • Ruoyu Wang, Maokui He, Jun Du, Hengshun Zhou, Shutong Niu, Hang Chen, Yanyan Yue, Gaobin Yang, Shilong Wu, Lei Sun, Yanhui Tu, Haitao Tang, Shuangqing Qian, Tian Gao, Mengzhi Wang, Genshun Wan, Jia Pan, Jianqing Gao, Chin-Hui Lee
This technical report details our submission system to the CHiME-7 DASR Challenge, which focuses on speaker diarization and speech recognition under complex multi-speaker scenarios.
no code implementations • 27 Jun 2023 • Haitao Tang, Yu Fu, Lei Sun, Jiabin Xue, Dan Liu, Yongchao Li, Zhiqiang Ma, Minghui Wu, Jia Pan, Genshun Wan, Ming'en Zhao
In this paper, we propose an adaptive two-stage knowledge distillation method consisting of hidden layer learning and output layer learning.
no code implementations • 7 Dec 2022 • Fenglin Ding, Genshun Wan, Pengcheng Li, Jia Pan, Cong Liu
Multilingual end-to-end models have shown great improvement over monolingual systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 7 Dec 2022 • Pengcheng Li, Genshun Wan, Fenglin Ding, Hang Chen, Jianqing Gao, Jia Pan, Cong Liu
Speech pre-training has shown great success in learning useful and general latent representations from large-scale unlabeled data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 7 Dec 2022 • Genshun Wan, Tan Liu, Hang Chen, Jia Pan, Cong Liu, Zhongfu Ye
Self-supervised learning (SSL) models have achieved considerable improvements in automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 6 Dec 2022 • Jing-Xuan Zhang, Genshun Wan, Zhen-Hua Ling, Jia Pan, Jianqing Gao, Cong Liu
AV2vec has a student and a teacher module, in which the student performs a masked latent feature regression task using the multimodal target features generated online by the teacher.