no code implementations • 24 Aug 2023 • Yu Zheng, Yajun Zhang, Chuanying Niu, Yibin Zhan, Yanhua Long, Dongxing Xu
Our final system is a fusion of six models and achieves the first place in Track 1 and second place in Track 2 of VoxSRC 2023.
no code implementations • 20 Jun 2023 • Xuefei Wang, Yanhua Long, Yijie Li, Haoran Wei
Moreover, we propose to train the Aformer in a multi-pass manner, and investigate three cross-information fusion methods to effectively combine the information from both general and accent encoders.
no code implementations • 22 Nov 2022 • Xiaofeng Ge, Jiangyu Han, Haixin Guan, Yanhua Long
Recently, more and more personalized speech enhancement systems (PSE) with excellent performance have been proposed.
no code implementations • 3 Nov 2022 • Li Li, Dongxing Xu, Haoran Wei, Yanhua Long
Exploiting effective target modeling units is very important and has always been a concern in end-to-end automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 31 Oct 2022 • Jiangyu Han, Yuhang Cao, Heng Lu, Yanhua Long
In recent years, speaker diarization has attracted widespread attention.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 23 Apr 2022 • Jiangyu Han, Yanhua Long
SCT follows a framework using two heterogeneous neural networks (HNNs) to produce high confidence pseudo labels of unlabeled real speech mixtures.
no code implementations • 4 Mar 2022 • Xiaofeng Ge, Jiangyu Han, Yanhua Long, Haixin Guan
Finally, we propose to integrate the loss of complex subband gain, SNR, pitch filtering strength, and an OA loss in a multi-objective learning manner to further improve the speech enhancement performance.
no code implementations • 4 Mar 2022 • Yunhao Liang, Yanhua Long, Yijie Li, Jiaen Liang
In recent years, exploring effective sound separation (SSep) techniques to improve overlapping sound event detection (SED) attracts more and more attention.
1 code implementation • 27 Dec 2021 • Jiangyu Han, Yanhua Long, Lukas Burget, Jan Cernocky
Particularly, we find that the Mixture-Remix fine-tuning with DPCCN significantly outperforms the TD-SpeakerBeam for unsupervised cross-domain TSE, with around 3. 5 dB SISNR improvement on target domain test set, without any source domain performance degradation.
no code implementations • 6 Jun 2021 • Jiangyu Han, Wei Rao, Yannan Wang, Yanhua Long
Moreover, new combination strategies of the CD-based spatial information and target speaker adaptation of parallel encoder outputs are also investigated.
no code implementations • 26 Mar 2021 • Tiantian Tang, Xinyuan Zhou, Yanhua Long, Yijie Li, Jiaen Liang
Domain mismatch is a noteworthy issue in acoustic event detection tasks, as the target domain data is difficult to access in most real applications.
no code implementations • 23 Mar 2021 • Yunhao Liang, Yanhua Long, Yijie Li, Jiaen Liang, Yuping Wang
A good joint training framework is very helpful to improve the performances of weakly supervised audio tagging (AT) and acoustic event detection (AED) simultaneously.
1 code implementation • 19 Oct 2020 • Jiangyu Han, Xinyuan Zhou, Yanhua Long, Yijie Li
In this work, we propose two methods for exploiting the multi-channel spatial information to extract the target speech.
Speech Extraction Audio and Speech Processing
no code implementations • 19 Oct 2020 • Jiangyu Han, Wei Rao, Yanhua Long, Jiaen Liang
Furthermore, by introducing a mixture embedding matrix pooling method, our proposed attention-based scaling adaptation (ASA) can exploit the target speaker clues in a more efficient way.