no code implementations • 7 Oct 2023 • Ze Li, Yuke Lin, Ning Jiang, Xiaoyi Qin, Guoqing Zhao, Haiying Wu, Ming Li
Utilizing the pseudo-labeling algorithm with large-scale unlabeled data becomes crucial for semi-supervised domain adaptation in speaker verification tasks.
1 code implementation • 25 Sep 2023 • Yuke Lin, Xiaoyi Qin, Ning Jiang, Guoqing Zhao, Ming Li
It is widely acknowledged that discriminative representation for speaker verification can be extracted from verbal speech.
no code implementations • 17 Aug 2023 • Ze Li, Yuke Lin, Xiaoyi Qin, Ning Jiang, Guoqing Zhao, Ming Li
For Track 1, we utilize a network structure based on ResNet for training.
no code implementations • 15 Aug 2023 • Ming Cheng, Weiqing Wang, Xiaoyi Qin, Yuke Lin, Ning Jiang, Guoqing Zhao, Ming Li
This paper describes the DKU-MSXF submission to track 4 of the VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC-23).
no code implementations • 14 Aug 2023 • Yuke Lin, Xiaoyi Qin, Guoqing Zhao, Ming Cheng, Ning Jiang, Haiyang Wu, Ming Li
In this paper, we introduce a large-scale and high-quality audio-visual speaker verification dataset, named VoxBlink.
no code implementations • 28 Oct 2022 • Yuke Lin, Xiaoyi Qin, Huahua Cui, Zhenyi Zhu, Ming Li
We collect a set of clips with laughter components by conducting a laughter detection script on VoxCeleb and part of the CN-Celeb dataset.
no code implementations • 28 Oct 2022 • Ming Cheng, Weiqing Wang, Yucong Zhang, Xiaoyi Qin, Ming Li
Target-speaker voice activity detection is currently a promising approach for speaker diarization in complex acoustic environments.
no code implementations • 4 Oct 2022 • Weiqing Wang, Xiaoyi Qin, Ming Cheng, Yucong Zhang, Kangyue Wang, Ming Li
This paper discribes the DKU-DukeECE submission to the 4th track of the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22).
1 code implementation • 12 Sep 2022 • Xiaoyi Qin, Ming Li, Hui Bu, Shrikanth Narayanan, Haizhou Li
In addition, a supplementary set for the FFSVC2020 dataset is released this year.
no code implementations • 15 Jul 2022 • Xingming Wang, Xiaoyi Qin, Yikang Wang, Yunfei Xu, Ming Li
For CM systems, we propose two methods on top of the challenge baseline to further improve the performance, namely Embedding Random Sampling Augmentation (ERSA) and One-Class Confusion Loss(OCCL).
1 code implementation • 13 Jul 2022 • Xiaoyi Qin, Na Li, Chao Weng, Dan Su, Ming Li
In this paper, we mine cross-age test sets based on the VoxCeleb dataset and propose our age-invariant speaker representation(AISR) learning method.
no code implementations • 6 Feb 2022 • Weiqing Wang, Xiaoyi Qin, Ming Li
The multi-channel TS-VAD further reduces the DER by 28% and achieves a DER of 2. 26%.
1 code implementation • 6 Nov 2021 • Haozhe Zhang, Zexin Cai, Xiaoyi Qin, Ming Li
Moreover, speaker information control is added to our system to maintain the voice cloning performance.
1 code implementation • 22 Apr 2021 • Yaogen Yang, Haozhe Zhang, Xiaoyi Qin, Shanshan Liang, Huahua Cui, Mingyang Xu, Ming Li
We achieve cross-lingual VC between Mandarin speech with multiple speakers and English speech with multiple speakers by applying bilingual bottleneck features.
no code implementations • 6 Apr 2021 • Tinglong Zhu, Xiaoyi Qin, Ming Li
Although deep neural networks are successful for many tasks in the speech domain, the high computational and memory costs of deep neural networks make it difficult to directly deploy highperformance Neural Network systems on low-resource embedded devices.
no code implementations • 3 Jul 2019 • Zexin Cai, Yaogen Yang, Chuxiong Zhang, Xiaoyi Qin, Ming Li
This paper describes a conditional neural network architecture for Mandarin Chinese polyphone disambiguation.