Speech Extraction
7 papers with code • 0 benchmarks • 2 datasets
Benchmarks
These leaderboards are used to track progress in Speech Extraction
Most implemented papers
Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam
First, we propose a time-domain implementation of SpeakerBeam similar to that proposed for a time-domain audio separation network (TasNet), which has achieved state-of-the-art performance for speech separation.
Target Speech Extraction Based on Blind Source Separation and X-vector-based Speaker Selection Trained with Data Augmentation
Extracting the desired speech from a mixture is a meaningful and challenging task.
Multi-channel target speech extraction with channel decorrelation and target speaker adaptation
In this work, we propose two methods for exploiting the multi-channel spatial information to extract the target speech.
DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation And Extraction
Particularly, we find that the Mixture-Remix fine-tuning with DPCCN significantly outperforms the TD-SpeakerBeam for unsupervised cross-domain TSE, with around 3. 5 dB SISNR improvement on target domain test set, without any source domain performance degradation.
On the Role of Spatial, Spectral, and Temporal Processing for DNN-based Non-linear Multi-channel Speech Enhancement
Employing deep neural networks (DNNs) to directly learn filters for multi-channel speech enhancement has potentially two key advantages over a traditional approach combining a linear spatial filter with an independent tempo-spectral post-filter: 1) non-linear spatial filtering allows to overcome potential restrictions originating from a linear processing model and 2) joint processing of spatial and tempo-spectral information allows to exploit interdependencies between different sources of information.
Analysis of impact of emotions on target speech extraction and speech separation
One of the factors causing such degradation may be intrinsic speaker variability, such as emotions, occurring commonly in realistic speech.
Neural Target Speech Extraction: An Overview
Humans can listen to a target speaker even in challenging acoustic conditions that have noise, reverberation, and interfering speakers.