no code implementations • 2 Apr 2018 • Weicheng Cai, Zexin Cai, Xiang Zhang, Xiaoqi Wang, Ming Li
A novel learnable dictionary encoding layer is proposed in this paper for end-to-end language identification.
no code implementations • 2 Apr 2018 • Weicheng Cai, Zexin Cai, Wenbo Liu, Xiaoqi Wang, Ming Li
After comparing with the state-of-the-art GMM i-vector methods, we give insights into CNN, and reveal its role and effect in the whole pipeline.
no code implementations • 3 Jul 2019 • Zexin Cai, Yaogen Yang, Chuxiong Zhang, Xiaoyi Qin, Ming Li
This paper describes a conditional neural network architecture for Mandarin Chinese polyphone disambiguation.
no code implementations • 21 May 2020 • Zexin Cai, Yaogen Yang, Ming Li
In addition, we investigate the model's performance on the cross-lingual synthesis, with and without a bilingual dataset during training.
no code implementations • 3 Nov 2020 • Yan Jia, Zexin Cai, Murong Ma, Zeqing Zhao, Xuyang Wang, Junjie Wang, Ming Li
Confusing-words are commonly encountered in real-life keyword spotting applications, which causes severe degradation of performance due to complex spoken terms and various kinds of words that sound similar to the predefined keywords.
no code implementations • 11 Apr 2021 • Yechen Wang, Yan Jia, Murong Ma, Zexin Cai, Ming Li
This paper introduces the system submitted by the DKU-SMIIP team for the Auto-KWS 2021 Challenge.
no code implementations • 26 Jan 2022 • Zexin Cai, Ming Li
In this paper, we propose an invertible deep learning framework called INVVC for voice conversion.
no code implementations • 18 Jun 2022 • Danwei Cai, Zexin Cai, Ming Li
An automatic speaker verification system aims to verify the speaker identity of a speech signal.
no code implementations • 1 Nov 2022 • Zexin Cai, Weiqing Wang, Ming Li
The present paper proposes a waveform boundary detection system for audio spoofing attacks containing partially manipulated segments.
no code implementations • 20 Aug 2023 • Zexin Cai, Weiqing Wang, Yikang Wang, Ming Li
This paper introduces our system designed for Track 2, which focuses on locating manipulated regions, in the second Audio Deepfake Detection Challenge (ADD 2023).
no code implementations • 3 Jan 2024 • Danwei Cai, Zexin Cai, Ming Li
Specifically, a teacher model continually refines pseudo labels through online clustering, providing dynamic supervision signals to train the student model.
1 code implementation • 6 Nov 2021 • Haozhe Zhang, Zexin Cai, Xiaoyi Qin, Ming Li
Moreover, speaker information control is added to our system to maintain the voice cloning performance.
1 code implementation • 9 Sep 2018 • Jinkun Chen, Weicheng Cai, Danwei Cai, Zexin Cai, Haibin Zhong, Ming Li
In this paper, we apply the NetFV and NetVLAD layers for the end-to-end language identification task.
1 code implementation • 10 May 2020 • Zexin Cai, Chuxiong Zhang, Ming Li
The constraint is taken by an added loss related to the speaker identity, which is centralized to improve the speaker similarity between the synthesized speech and its natural reference audio.