no code implementations • 17 Dec 2023 • Wenhao Guan, Yishuang Li, Tao Li, Hukai Huang, Feng Wang, Jiayan Lin, Lingyan Huang, Lin Li, Qingyang Hong
The challenges of modeling such a multi-modal style controllable TTS mainly lie in two aspects:1)aligning the multi-modal information into a unified style space to enable the input of arbitrary modality as the style prompt in a single system, and 2)efficiently transferring the unified style representation into the given text content, thereby empowering the ability to generate prompt style-related voice.
no code implementations • 26 Jun 2023 • Jie Wang, Zhicong Chen, Haodong Zhou, Lin Li, Qingyang Hong
The CDGCN-based clustering method consists of graph generation, sub-graph detection, and Graph-based Overlapped Speech Detection (Graph-OSD).
1 code implementation • 14 Nov 2022 • Dexin Liao, Tao Jiang, Feng Wang, Lin Li, Qingyang Hong
Transformer has achieved extraordinary performance in Natural Language Processing and Computer Vision tasks thanks to its powerful self-attention mechanism, and its variant Conformer has become a state-of-the-art architecture in the field of Automatic Speech Recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 24 Sep 2022 • Jie Wang, Yuji Liu, Binling Wang, Yiming Zhi, Song Li, Shipeng Xia, Jiayang Zhang, Feng Tong, Lin Li, Qingyang Hong
This paper describes a spatial-aware speaker diarization system for the multi-channel multi-party meeting.
no code implementations • 28 May 2022 • Fuchuan Tong, Siqi Zheng, Haodong Zhou, Xingjia Xie, Qingyang Hong, Lin Li
While promising performance for speaker verification has been achieved by deep speaker embeddings, the advantage would reduce in the case of speaking-style variability.
no code implementations • 25 Apr 2022 • Fuchuan Tong, Siqi Zheng, Min Zhang, Yafeng Chen, Hongbin Suo, Qingyang Hong, Lin Li
In this work, we present a GCN-based approach for semi-supervised learning.
no code implementations • 11 Feb 2022 • Jie Wang, Yuji Liu, Binling Wang, Yiming Zhi, Song Li1, Shipeng Xia, Jiayang Zhang, Lin Li1, Qingyang Hong, Feng Tong
By performing DMSNet based OSD module, the DER of cluster-based diarization system decrease significantly form 13. 44% to 7. 63%.
no code implementations • 6 Sep 2021 • Jie Wang, Fuchuang Tong, Zhicong Chen, Lin Li, Qingyang Hong, Haodong Zhou
This paper describes the XMUSPEECH speaker recognition and diarisation systems for the VoxCeleb Speaker Recognition Challenge 2021.
no code implementations • 23 Jul 2021 • Binling Wang, Wenxuan Hu, Jing Li, Yiming Zhi, Zheng Li, Qingyang Hong, Lin Li, Dong Wang, Liming Song, Cheng Yang
In addition to the Language Identification (LID) tasks, multilingual Automatic Speech Recognition (ASR) tasks are introduced to OLR 2021 Challenge for the first time.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 5 Jul 2021 • Jing Li, Binling Wang, Yiming Zhi, Zheng Li, Lin Li, Qingyang Hong, Dong Wang
The fifth Oriental Language Recognition (OLR) Challenge focuses on language recognition in a variety of complex environments to promote its development.
no code implementations • 30 Jun 2021 • Dexin Liao, Jing Li, Yiming Zhi, Song Li, Qingyang Hong, Lin Li
For the SV system, we proposed a multi-task learning network, where phonetic branch is trained with the character label of the utterance, and speaker branch is trained with the label of the speaker.
no code implementations • 25 Jun 2021 • Yan Liu, Zheng Li, Lin Li, Qingyang Hong
This paper proposes a multi-task learning network with phoneme-aware and channel-wise attentive learning strategies for text-dependent Speaker Verification (SV).
no code implementations • 4 Jun 2020 • Zheng Li, Miao Zhao, Qingyang Hong, Lin Li, Zhiyuan Tang, Dong Wang, Li-Ming Song, Cheng Yang
Based on Kaldi and Pytorch, recipes for i-vector and x-vector systems are also conducted as baselines for the three tasks.