1 code implementation • 20 Jan 2025 • Ziling Huang, Haixin Guan, Haoran Wei, Yanhua Long
Personalized speech enhancement (PSE) methods typically rely on pre-trained speaker verification models or self-designed speaker encoders to extract target speaker clues, guiding the PSE model in isolating the desired speech.
no code implementations • 24 Aug 2023 • Yu Zheng, Yajun Zhang, Chuanying Niu, Yibin Zhan, Yanhua Long, Dongxing Xu
Our final system is a fusion of six models and achieves the first place in Track 1 and second place in Track 2 of VoxSRC 2023.
no code implementations • 20 Jun 2023 • Xuefei Wang, Yanhua Long, Yijie Li, Haoran Wei
Moreover, we propose to train the Aformer in a multi-pass manner, and investigate three cross-information fusion methods to effectively combine the information from both general and accent encoders.
no code implementations • 22 Nov 2022 • Xiaofeng Ge, Jiangyu Han, Haixin Guan, Yanhua Long
Recently, more and more personalized speech enhancement systems (PSE) with excellent performance have been proposed.
no code implementations • 3 Nov 2022 • Li Li, Dongxing Xu, Haoran Wei, Yanhua Long
Exploiting effective target modeling units is very important and has always been a concern in end-to-end automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 31 Oct 2022 • Jiangyu Han, Yuhang Cao, Heng Lu, Yanhua Long
In recent years, speaker diarization has attracted widespread attention.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 23 Apr 2022 • Jiangyu Han, Yanhua Long
SCT follows a framework using two heterogeneous neural networks (HNNs) to produce high confidence pseudo labels of unlabeled real speech mixtures.
no code implementations • 4 Mar 2022 • Yunhao Liang, Yanhua Long, Yijie Li, Jiaen Liang
In recent years, exploring effective sound separation (SSep) techniques to improve overlapping sound event detection (SED) attracts more and more attention.
no code implementations • 4 Mar 2022 • Xiaofeng Ge, Jiangyu Han, Yanhua Long, Haixin Guan
Finally, we propose to integrate the loss of complex subband gain, SNR, pitch filtering strength, and an OA loss in a multi-objective learning manner to further improve the speech enhancement performance.
1 code implementation • 27 Dec 2021 • Jiangyu Han, Yanhua Long, Lukas Burget, Jan Cernocky
Particularly, we find that the Mixture-Remix fine-tuning with DPCCN significantly outperforms the TD-SpeakerBeam for unsupervised cross-domain TSE, with around 3. 5 dB SISNR improvement on target domain test set, without any source domain performance degradation.
no code implementations • 6 Jun 2021 • Jiangyu Han, Wei Rao, Yannan Wang, Yanhua Long
Moreover, new combination strategies of the CD-based spatial information and target speaker adaptation of parallel encoder outputs are also investigated.
no code implementations • 26 Mar 2021 • Tiantian Tang, Xinyuan Zhou, Yanhua Long, Yijie Li, Jiaen Liang
Domain mismatch is a noteworthy issue in acoustic event detection tasks, as the target domain data is difficult to access in most real applications.
no code implementations • 23 Mar 2021 • Yunhao Liang, Yanhua Long, Yijie Li, Jiaen Liang, Yuping Wang
A good joint training framework is very helpful to improve the performances of weakly supervised audio tagging (AT) and acoustic event detection (AED) simultaneously.
1 code implementation • 19 Oct 2020 • Jiangyu Han, Xinyuan Zhou, Yanhua Long, Yijie Li
In this work, we propose two methods for exploiting the multi-channel spatial information to extract the target speech.
Speech Extraction
Audio and Speech Processing
no code implementations • 19 Oct 2020 • Jiangyu Han, Wei Rao, Yanhua Long, Jiaen Liang
Furthermore, by introducing a mixture embedding matrix pooling method, our proposed attention-based scaling adaptation (ASA) can exploit the target speaker clues in a more efficient way.