The task of extracting all overlapping speech sources in a given mixed speech signal refers to the Speech Separation. Speech Separation is a special scenario of source separation problem, where the focus is only on the overlapping speech signal sources and other interferences such as music or noise signals are not the main concern of the study.
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
The majority of the previous methods have formulated the separation problem through the time-frequency representation of the mixed signal, which has several drawbacks, including the decoupling of the phase and magnitude of the signal, the suboptimality of time-frequency representation for speech separation, and the long latency in calculating the spectrograms.
Ranked #9 on Music Source Separation on MUSDB18
By introduces a improved transformer, elements in speech sequences can interact directly, which enables DPTNet can model for the speech sequences with direct context-awareness.
Ranked #5 on Speech Separation on wsj0-2mix
SPEECH SEPARATION AUDIO AND SPEECH PROCESSING SOUND
Also, we validate the use of parameterized filterbanks and show that complex-valued representations and masks are beneficial in all conditions.
In the first step we learn a transform (and it's inverse) to a latent space where masking-based separation performance using oracles is optimal.
Ranked #13 on Speech Separation on wsj0-2mix
Recent studies in deep learning-based speech separation have proven the superiority of time-domain approaches to conventional time-frequency-based methods.
Ranked #9 on Speech Separation on wsj0-2mix
We investigate the recently proposed Time-domain Audio Sep-aration Network (TasNet) in the task of real-time single-channel speech dereverberation.
Ranked #15 on Speech Separation on wsj0-2mix
We directly model the signal in the time-domain using an encoder-decoder framework and perform the source separation on nonnegative encoder outputs.
Ranked #17 on Speech Separation on wsj0-2mix