Search Results for author: Wei Rao

Found 11 papers, 4 papers with code

spatial-dccrn: dccrn equipped with frame-level angle feature and hybrid filtering for multi-channel speech enhancement

no code implementations17 Oct 2022 Shubo Lv, Yihui Fu, Yukai Jv, Lei Xie, Weixin Zhu, Wei Rao, Yannan Wang

Recently, multi-channel speech enhancement has drawn much interest due to the use of spatial information to distinguish target speech from interfering signal.

Denoising Speech Enhancement

Improving Channel Decorrelation for Multi-Channel Target Speech Extraction

no code implementations6 Jun 2021 Jiangyu Han, Wei Rao, Yannan Wang, Yanhua Long

Moreover, new combination strategies of the CD-based spatial information and target speaker adaptation of parallel encoder outputs are also investigated.

Speech Extraction

Target Speaker Verification with Selective Auditory Attention for Single and Multi-talker Speech

1 code implementation30 Mar 2021 Chenglin Xu, Wei Rao, Jibin Wu, Haizhou Li

Inspired by the study on target speaker extraction, e. g., SpEx, we propose a unified speaker verification framework for both single- and multi-talker speech, that is able to pay selective auditory attention to the target speaker.

Multi-Task Learning Speaker Verification +1

Attention-based scaling adaptation for target speech extraction

no code implementations19 Oct 2020 Jiangyu Han, Wei Rao, Yanhua Long, Jiaen Liang

Furthermore, by introducing a mixture embedding matrix pooling method, our proposed attention-based scaling adaptation (ASA) can exploit the target speaker clues in a more efficient way.

Speech Extraction

Time-domain speaker extraction network

no code implementations29 Apr 2020 Cheng-Lin Xu, Wei Rao, Eng Siong Chng, Haizhou Li

The inaccuracy of phase estimation is inherent to the frequency domain processing, that affects the quality of signal reconstruction.

Audio and Speech Processing Sound

SpEx: Multi-Scale Time Domain Speaker Extraction Network

1 code implementation17 Apr 2020 Cheng-Lin Xu, Wei Rao, Eng Siong Chng, Haizhou Li

Inspired by Conv-TasNet, we propose a time-domain speaker extraction network (SpEx) that converts the mixture speech into multi-scale embedding coefficients instead of decomposing the speech signal into magnitude and phase spectra.

Multi-Task Learning

Fantastic 4 system for NIST 2015 Language Recognition Evaluation

no code implementations5 Feb 2016 Kong Aik Lee, Ville Hautamäki, Anthony Larcher, Wei Rao, Hanwu Sun, Trung Hieu Nguyen, Guangsen Wang, Aleksandr Sizov, Ivan Kukanov, Amir Poorjam, Trung Ngo Trong, Xiong Xiao, Cheng-Lin Xu, Hai-Hua Xu, Bin Ma, Haizhou Li, Sylvain Meignier

This article describes the systems jointly submitted by Institute for Infocomm (I$^2$R), the Laboratoire d'Informatique de l'Universit\'e du Maine (LIUM), Nanyang Technology University (NTU) and the University of Eastern Finland (UEF) for 2015 NIST Language Recognition Evaluation (LRE).


Cannot find the paper you are looking for? You can Submit a new open access paper.