The majority of the previous methods have formulated the separation problem through the time-frequency representation of the mixed signal, which has several drawbacks, including the decoupling of the phase and magnitude of the signal, the suboptimality of time-frequency representation for speech separation, and the long latency in calculating the spectrograms.
Ranked #9 on
Music Source Separation
on MUSDB18
MUSIC SOURCE SEPARATION SPEAKER SEPARATION SPEECH ENHANCEMENT SPEECH SEPARATION
In this paper, we present a novel system that separates the voice of a target speaker from multi-speaker signals, by making use of a reference signal from the target speaker.
SPEAKER RECOGNITION SPEAKER SEPARATION SPEECH ENHANCEMENT SPEECH RECOGNITION
In this paper we extend the baseline system with an end-to-end signal approximation objective that greatly improves performance on a challenging speech separation.
DEEP CLUSTERING SPEAKER SEPARATION SPEECH RECOGNITION SPEECH SEPARATION
Although the matrix determined by the output weights is dependent on a set of known speakers, we only use the input vectors during inference.
Simultaneous grouping is first performed in each time frame by separating the spectra of different speakers with a permutation-invariantly trained neural network.
Ranked #10 on
Speech Separation
on wsj0-2mix
How to stably select correct label permutations is a long-standing problem.
Ranked #3 on
Speech Separation
on wsj0-2mix
In this work, we introduce a new method---Neural Egg Separation---to tackle the scenario of extracting a signal from an unobserved distribution additively mixed with a signal from an observed distribution.