Speech Extraction

Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam

jyhan03/channel-decorrelation 23 Jan 2020

First, we propose a time-domain implementation of SpeakerBeam similar to that proposed for a time-domain audio separation network (TasNet), which has achieved state-of-the-art performance for speech separation.

Multi-channel target speech extraction with channel decorrelation and target speaker adaptation

jyhan03/channel-decorrelation 19 Oct 2020

In this work, we propose two methods for exploiting the multi-channel spatial information to extract the target speech.

DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation And Extraction

jyhan03/icassp22-dataset 27 Dec 2021

Particularly, we find that the Mixture-Remix fine-tuning with DPCCN significantly outperforms the TD-SpeakerBeam for unsupervised cross-domain TSE, with around 3. 5 dB SISNR improvement on target domain test set, without any source domain performance degradation.

On the Role of Spatial, Spectral, and Temporal Processing for DNN-based Non-linear Multi-channel Speech Enhancement

sp-uhh/deep-non-linear-filter 22 Jun 2022

Employing deep neural networks (DNNs) to directly learn filters for multi-channel speech enhancement has potentially two key advantages over a traditional approach combining a linear spatial filter with an independent tempo-spectral post-filter: 1) non-linear spatial filtering allows to overcome potential restrictions originating from a linear processing model and 2) joint processing of spatial and tempo-spectral information allows to exploit interdependencies between different sources of information.

Analysis of impact of emotions on target speech extraction and speech separation

butspeechfit/ravdess2mix 15 Aug 2022

One of the factors causing such degradation may be intrinsic speaker variability, such as emotions, occurring commonly in realistic speech.

Neural Target Speech Extraction: An Overview

butspeechfit/speakerbeam 31 Jan 2023

Humans can listen to a target speaker even in challenging acoustic conditions that have noise, reverberation, and interfering speakers.