no code implementations • 7 Oct 2023 • JiaMing Wang, Zhihao Du, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, Jin Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang Zhang
In this paper, we propose LauraGPT, a unified GPT model for audio recognition, understanding, and generation.
1 code implementation • 14 Sep 2023 • Zhihao Du, Shiliang Zhang, Kai Hu, Siqi Zheng
We also demonstrate that the pre-trained models are suitable for downstream tasks, including automatic speech recognition and personalized text-to-speech synthesis.
no code implementations • 21 May 2023 • Mohan Shi, Zhihao Du, Qian Chen, Fan Yu, Yangze Li, Shiliang Zhang, Jie Zhang, Li-Rong Dai
In addition, a two-pass decoding strategy is further proposed to fully leverage the contextual modeling ability resulting in a better recognition performance.
1 code implementation • 18 May 2023 • Zhifu Gao, Zerui Li, JiaMing Wang, Haoneng Luo, Xian Shi, Mengzhe Chen, Yabin Li, Lingyun Zuo, Zhihao Du, Zhangyu Xiao, Shiliang Zhang
FunASR offers models trained on large-scale industrial corpora and the ability to deploy them in applications.
Ranked #1 on
Speech Recognition
on WenetSpeech
(using extra training data)
1 code implementation • 8 Mar 2023 • JiaMing Wang, Zhihao Du, Shiliang Zhang
Recently, end-to-end neural diarization (EEND) is introduced and achieves promising results in speaker-overlapped scenarios.
Ranked #1 on
Speaker Diarization
on CALLHOME
no code implementations • 1 Nov 2022 • Mohan Shi, Jie Zhang, Zhihao Du, Fan Yu, Qian Chen, Shiliang Zhang, Li-Rong Dai
Speaker-attributed automatic speech recognition (SA-ASR) in multi-party meeting scenarios is one of the most valuable and challenging ASR task.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 31 Mar 2022 • Fan Yu, Zhihao Du, Shiliang Zhang, Yuxiao Lin, Lei Xie
Therefore, we propose the second approach, WD-SOT, to address alignment errors by introducing a word-level diarization model, which can get rid of such timestamp alignment dependency.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 18 Mar 2022 • Zhihao Du, Shiliang Zhang, Siqi Zheng, Zhijie Yan
Through this formulation, we propose the speaker embedding-aware neural diarization (SEND) framework, where a speech encoder, a speaker encoder, two similarity scorers, and a post-processing network are jointly optimized to predict the encoded labels according to the similarities between speech features and speaker embeddings.
Ranked #1 on
Speaker Diarization
on AliMeeting
2 code implementations • 28 Nov 2021 • Zhihao Du, Shiliang Zhang, Siqi Zheng, Weilong Huang, Ming Lei
In this paper, we reformulate this task as a single-label prediction problem by encoding the multi-speaker labels with power set.
no code implementations • 20 Jun 2019 • Yue Gu, Zhihao Du, HUI ZHANG, Xueliang Zhang
To improve the robustness, a speech enhancement front-end is involved.
1 code implementation • 10 Apr 2019 • Hongwei Song, Jiqing Han, Shiwen Deng, Zhihao Du
In this paper, we propose a new strategy for acoustic scene classification (ASC) , namely recognizing acoustic scenes through identifying distinct sound events.