10 papers with code • 0 benchmarks • 3 datasets
These leaderboards are used to track progress in Speaker Separation
The majority of the previous methods have formulated the separation problem through the time-frequency representation of the mixed signal, which has several drawbacks, including the decoupling of the phase and magnitude of the signal, the suboptimality of time-frequency representation for speech separation, and the long latency in calculating the spectrograms.
In this paper, we present a novel system that separates the voice of a target speaker from multi-speaker signals, by making use of a reference signal from the target speaker.
In this paper we extend the baseline system with an end-to-end signal approximation objective that greatly improves performance on a challenging speech separation.
Although our system is trained on simulated room impulse responses (RIR) based on a fixed number of microphones arranged in a given geometry, it generalizes well to a real array with the same geometry.
Although the matrix determined by the output weights is dependent on a set of known speakers, we only use the input vectors during inference.
In this work, we introduce a new method---Neural Egg Separation---to tackle the scenario of extracting a signal from an unobserved distribution additively mixed with a signal from an observed distribution.
Simultaneous grouping is first performed in each time frame by separating the spectra of different speakers with a permutation-invariantly trained neural network.
Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss
We have open sourced our re-implementation of the DPRNN-TasNet here (https://github. com/ShiZiqiang/dual-path-RNNs-DPRNNs-based-speech-separation), and our TasTas is realized based on this implementation of DPRNN-TasNet, it is believed that the results in this paper can be reproduced with ease.
Speech separation has been well developed, with the very successful permutation invariant training (PIT) approach, although the frequent label assignment switching happening during PIT training remains to be a problem when better convergence speed and achievable performance are desired.
In this paper, we present the Blind Speech Separation and Dereverberation (BSSD) network, which performs simultaneous speaker separation, dereverberation and speaker identification in a single neural network.