1 code implementation • 15 Sep 2023 • Jiangyu Han, Federico Landini, Johan Rohdin, Mireia Diez, Lukas Burget, Yuhang Cao, Heng Lu, Jan Cernocky
In this work, we propose an error correction framework, named DiaCorrect, to refine the output of a diarization system in a simple yet effective way.
1 code implementation • 15 Mar 2023 • Yuguang Yang, Yu Pan, JingJing Yin, Jiangyu Han, Lei Ma, Heng Lu
SqueezeFormer has recently shown impressive performance in automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 22 Nov 2022 • Xiaofeng Ge, Jiangyu Han, Haixin Guan, Yanhua Long
Recently, more and more personalized speech enhancement systems (PSE) with excellent performance have been proposed.
no code implementations • 31 Oct 2022 • Jiangyu Han, Yuhang Cao, Heng Lu, Yanhua Long
In recent years, speaker diarization has attracted widespread attention.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 23 Apr 2022 • Jiangyu Han, Yanhua Long
SCT follows a framework using two heterogeneous neural networks (HNNs) to produce high confidence pseudo labels of unlabeled real speech mixtures.
no code implementations • 4 Mar 2022 • Xiaofeng Ge, Jiangyu Han, Yanhua Long, Haixin Guan
Finally, we propose to integrate the loss of complex subband gain, SNR, pitch filtering strength, and an OA loss in a multi-objective learning manner to further improve the speech enhancement performance.
1 code implementation • 27 Dec 2021 • Jiangyu Han, Yanhua Long, Lukas Burget, Jan Cernocky
Particularly, we find that the Mixture-Remix fine-tuning with DPCCN significantly outperforms the TD-SpeakerBeam for unsupervised cross-domain TSE, with around 3. 5 dB SISNR improvement on target domain test set, without any source domain performance degradation.
no code implementations • 6 Jun 2021 • Jiangyu Han, Wei Rao, Yannan Wang, Yanhua Long
Moreover, new combination strategies of the CD-based spatial information and target speaker adaptation of parallel encoder outputs are also investigated.
1 code implementation • 2 Apr 2021 • Wei Rao, Yihui Fu, Yanxin Hu, Xin Xu, Yvkai Jv, Jiangyu Han, Zhongjie Jiang, Lei Xie, Yannan Wang, Shinji Watanabe, Zheng-Hua Tan, Hui Bu, Tao Yu, Shidong Shang
The ConferencingSpeech 2021 challenge is proposed to stimulate research on far-field multi-channel speech enhancement for video conferencing.
no code implementations • 19 Oct 2020 • Jiangyu Han, Wei Rao, Yanhua Long, Jiaen Liang
Furthermore, by introducing a mixture embedding matrix pooling method, our proposed attention-based scaling adaptation (ASA) can exploit the target speaker clues in a more efficient way.
1 code implementation • 19 Oct 2020 • Jiangyu Han, Xinyuan Zhou, Yanhua Long, Yijie Li
In this work, we propose two methods for exploiting the multi-channel spatial information to extract the target speech.
Speech Extraction Audio and Speech Processing