no code implementations • 24 Sep 2024 • Fengrun Zhang, Wangjin Zhou, Yiming Liu, Wang Geng, Yahui Shan, Chen Zhang
There has been an increasing research interest in cross-age speaker verification~(CASV).
no code implementations • 24 Sep 2024 • Fengrun Zhang, Wang Geng, Hukai Huang, Yahui Shan, Cheng Yi, He Qu
To further enhance the collaboration of multiple experts and leverage the understanding capabilities of LLM, we propose a two-stage progressive training strategy: 1) The connector is unfrozen and trained with language-specialized experts to map speech representations to the text space.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+6
no code implementations • 11 Aug 2024 • Chunyu Qiang, Wang Geng, Yi Zhao, Ruibo Fu, Tao Wang, Cheng Gong, Tianrui Wang, Qiuyu Liu, Jiangyan Yi, Zhengqi Wen, Chen Zhang, Hao Che, Longbiao Wang, Jianwu Dang, JianHua Tao
For tasks such as text-to-speech (TTS), voice conversion (VC), and automatic speech recognition (ASR), a cross-modal fine-grained (frame-level) sequence representation is desired, emphasizing the semantic content of the text modality while de-emphasizing the paralinguistic information of the speech modality.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
1 code implementation • 2 Aug 2021 • Tianlong Kong, Shouyi Yin, Dawei Zhang, Wang Geng, Xin Wang, Dandan song, Jinwen Huang, Huiyu Shi, Xiaorui Wang
To address this issue, we propose a new architecture, named dynamic multi-scale convolution, which consists of dynamic kernel convolution, local multi-scale learning, and global multi-scale pooling.