no code implementations • 13 Feb 2024 • Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, JiaMing Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen
We found that delicate designs are not necessary, while an embarrassingly simple composition of off-the-shelf speech encoder, LLM, and the only trainable linear projector is competent for the ASR task.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
2 code implementations • 23 Dec 2023 • Ziyang Ma, Zhisheng Zheng, Jiaxin Ye, Jinchao Li, Zhifu Gao, Shiliang Zhang, Xie Chen
To the best of our knowledge, emotion2vec is the first universal representation model in various emotion-related tasks, filling a gap in the field.
1 code implementation • 7 Oct 2023 • JiaMing Wang, Zhihao Du, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, Jin Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang Zhang
In this paper, we propose LauraGPT, a unified GPT model for audio recognition, understanding, and generation.
2 code implementations • 7 Aug 2023 • Xian Shi, Yexin Yang, Zerui Li, Yanni Chen, Zhifu Gao, Shiliang Zhang
It possesses the advantages of AED-based model's accuracy, NAR model's efficiency, and explicit customization capacity of superior performance.
no code implementations • 18 May 2023 • Xian Shi, Haoneng Luo, Zhifu Gao, Shiliang Zhang, Zhijie Yan
Estimating confidence scores for recognition results is a classic task in ASR field and of vital importance for kinds of downstream tasks and training strategies.
1 code implementation • 18 May 2023 • Zhifu Gao, Zerui Li, JiaMing Wang, Haoneng Luo, Xian Shi, Mengzhe Chen, Yabin Li, Lingyun Zuo, Zhihao Du, Zhangyu Xiao, Shiliang Zhang
FunASR offers models trained on large-scale industrial corpora and the ability to deploy them in applications.
Ranked #1 on Speech Recognition on WenetSpeech (using extra training data)
2 code implementations • 16 Jun 2022 • Zhifu Gao, Shiliang Zhang, Ian McLoughlin, Zhijie Yan
However, due to an independence assumption within the output tokens, performance of single-step NAR is inferior to that of AR models, especially with a large-scale corpus.
1 code implementation • 21 May 2020 • Shiliang Zhang, Zhifu Gao, Haoneng Luo, Ming Lei, Jie Gao, Zhijie Yan, Lei Xie
Recently, streaming end-to-end automatic speech recognition (E2E-ASR) has gained more and more attention.
Sound Audio and Speech Processing