1 code implementation • 17 Apr 2025 • YiCheng Pan, Zhenrong Zhang, Pengfei Hu, Jiefeng Ma, Jun Du, Jianshu Zhang, Quan Liu, Jianqing Gao, Feng Ma
This improvement stems from our integration of the strengths of LLMs and symbolic systems, which enables a more reliable and interpretable approach for the GPS task.
1 code implementation • 9 Feb 2025 • Jing-Xuan Zhang, Genshun Wan, Jianqing Gao, Zhen-Hua Ling
We also introduce a multi-teacher ensemble method to distill the student, which receives audio-visual data as inputs.
Ranked #1 on
Automatic Speech Recognition (ASR)
on LRS3-TED
Audio-Visual Speech Recognition
Automatic Speech Recognition
+6
2 code implementations • 7 Feb 2025 • Yusheng Dai, Chenxi Wang, Chang Li, Chen Wang, Jun Du, Kewei Li, Ruoyu Wang, Jiefeng Ma, Lei Sun, Jianqing Gao
To address this issue, we propose Self-Loop Latent Swap, a frame-level bidirectional swap applied to the overlapping region of adjacent views.
no code implementations • 26 Sep 2024 • Mengzhi Wang, Shifu Xiong, Genshun Wan, Hang Chen, Jianqing Gao, LiRong Dai
In this work, we propose deep CLAS to use contextual information better.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 3 Sep 2024 • Shutong Niu, Ruoyu Wang, Jun Du, Gaobin Yang, Yanhui Tu, Siyuan Wu, Shuangqing Qian, Huaxin Wu, Haitao Xu, Xueyang Zhang, Guolong Zhong, Xindi Yu, Jieru Chen, Mengzhi Wang, Di Cai, Tian Gao, Genshun Wan, Feng Ma, Jia Pan, Jianqing Gao
This technical report outlines our submission system for the CHiME-8 NOTSOFAR-1 Challenge.
2 code implementations • 24 May 2024 • Chang Li, Ruoyu Wang, Lijuan Liu, Jun Du, Yixuan Sun, Zilu Guo, Zhenrong Zhang, Yuan Jiang, Jianqing Gao, Feng Ma
Text-to-music (TTM) generation, which converts textual descriptions into audio, opens up innovative avenues for multimedia creation.
Ranked #1 on
Music Generation
on Song Describer Dataset
no code implementations • 15 Sep 2023 • Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao
This pioneering effort aims to set the first benchmark for the AVTSE task, offering fresh insights into enhancing the ac-curacy of back-end speech recognition systems through AVTSE in challenging and real acoustic environments.
no code implementations • 28 Aug 2023 • Ruoyu Wang, Maokui He, Jun Du, Hengshun Zhou, Shutong Niu, Hang Chen, Yanyan Yue, Gaobin Yang, Shilong Wu, Lei Sun, Yanhui Tu, Haitao Tang, Shuangqing Qian, Tian Gao, Mengzhi Wang, Genshun Wan, Jia Pan, Jianqing Gao, Chin-Hui Lee
This technical report details our submission system to the CHiME-7 DASR Challenge, which focuses on speaker diarization and speech recognition under complex multi-speaker scenarios.
no code implementations • 7 Dec 2022 • Pengcheng Li, Genshun Wan, Fenglin Ding, Hang Chen, Jianqing Gao, Jia Pan, Cong Liu
Speech pre-training has shown great success in learning useful and general latent representations from large-scale unlabeled data.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 6 Dec 2022 • Jing-Xuan Zhang, Genshun Wan, Zhen-Hua Ling, Jia Pan, Jianqing Gao, Cong Liu
AV2vec has a student and a teacher module, in which the student performs a masked latent feature regression task using the multimodal target features generated online by the teacher.