no code implementations • 14 Mar 2023 • Yukai Ju, Jun Chen, Shimin Zhang, Shulin He, Wei Rao, Weixin Zhu, Yannan Wang, Tao Yu, Shidong Shang
This paper introduces the Unbeatable Team's submission to the ICASSP 2023 Deep Noise Suppression (DNS) Challenge.
no code implementations • 17 Oct 2022 • Shubo Lv, Yihui Fu, Yukai Jv, Lei Xie, Weixin Zhu, Wei Rao, Yannan Wang
Recently, multi-channel speech enhancement has drawn much interest due to the use of spatial information to distinguish target speech from interfering signal.
no code implementations • 6 Jun 2021 • Jiangyu Han, Wei Rao, Yannan Wang, Yanhua Long
Moreover, new combination strategies of the CD-based spatial information and target speaker adaptation of parallel encoder outputs are also investigated.
1 code implementation • 2 Apr 2021 • Wei Rao, Yihui Fu, Yanxin Hu, Xin Xu, Yvkai Jv, Jiangyu Han, Zhongjie Jiang, Lei Xie, Yannan Wang, Shinji Watanabe, Zheng-Hua Tan, Hui Bu, Tao Yu, Shidong Shang
The ConferencingSpeech 2021 challenge is proposed to stimulate research on far-field multi-channel speech enhancement for video conferencing.
1 code implementation • 30 Mar 2021 • Chenglin Xu, Wei Rao, Jibin Wu, Haizhou Li
Inspired by the study on target speaker extraction, e. g., SpEx, we propose a unified speaker verification framework for both single- and multi-talker speech, that is able to pay selective auditory attention to the target speaker.
no code implementations • 19 Oct 2020 • Jiangyu Han, Wei Rao, Yanhua Long, Jiaen Liang
Furthermore, by introducing a mixture embedding matrix pooling method, our proposed attention-based scaling adaptation (ASA) can exploit the target speaker clues in a more efficient way.
no code implementations • 29 Apr 2020 • Cheng-Lin Xu, Wei Rao, Eng Siong Chng, Haizhou Li
The inaccuracy of phase estimation is inherent to the frequency domain processing, that affects the quality of signal reconstruction.
Audio and Speech Processing Sound
1 code implementation • 17 Apr 2020 • Cheng-Lin Xu, Wei Rao, Eng Siong Chng, Haizhou Li
Inspired by Conv-TasNet, we propose a time-domain speaker extraction network (SpEx) that converts the mixture speech into multi-scale embedding coefficients instead of decomposing the speech signal into magnitude and phase spectra.
no code implementations • 16 Apr 2019 • Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda, Trung Ngo Trong, Md Sahidullah, Fan Lu, Yun Tang, Ming Tu, Kah Kuan Teh, Huy Dat Tran, Kuruvachan K. George, Ivan Kukanov, Florent Desnous, Jichen Yang, Emre Yilmaz, Longting Xu, Jean-Francois Bonastre, Cheng-Lin Xu, Zhi Hao Lim, Eng Siong Chng, Shivesh Ranjan, John H. L. Hansen, Massimiliano Todisco, Nicholas Evans
The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE).
1 code implementation • 24 Mar 2019 • Cheng-Lin Xu, Wei Rao, Eng Siong Chng, Haizhou Li
The SpeakerBeam-FE (SBF) method is proposed for speaker extraction.
no code implementations • 5 Feb 2016 • Kong Aik Lee, Ville Hautamäki, Anthony Larcher, Wei Rao, Hanwu Sun, Trung Hieu Nguyen, Guangsen Wang, Aleksandr Sizov, Ivan Kukanov, Amir Poorjam, Trung Ngo Trong, Xiong Xiao, Cheng-Lin Xu, Hai-Hua Xu, Bin Ma, Haizhou Li, Sylvain Meignier
This article describes the systems jointly submitted by Institute for Infocomm (I$^2$R), the Laboratoire d'Informatique de l'Universit\'e du Maine (LIUM), Nanyang Technology University (NTU) and the University of Eastern Finland (UEF) for 2015 NIST Language Recognition Evaluation (LRE).