About

The task of extracting all overlapping speech sources in a given mixed speech signal refers to the Speech Separation. Speech Separation is a special scenario of source separation problem, where the focus is only on the overlapping speech signal sources and other interferences such as music or noise signals are not the main concern of the study.

Source: A Unified Framework for Speech Separation

Image credit: Speech Separation of A Target Speaker Based on Deep Neural Networks

Benchmarks

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Subtasks

Datasets

Greatest papers with code

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation

20 Sep 2018facebookresearch/demucs

The majority of the previous methods have formulated the separation problem through the time-frequency representation of the mixed signal, which has several drawbacks, including the decoupling of the phase and magnitude of the signal, the suboptimality of time-frequency representation for speech separation, and the long latency in calculating the spectrograms.

MUSIC SOURCE SEPARATION SPEAKER SEPARATION SPEECH ENHANCEMENT SPEECH SEPARATION

Attention is All You Need in Speech Separation

25 Oct 2020speechbrain/speechbrain

Transformers are emerging as a natural alternative to standard RNNs, replacing recurrent computations with a multi-head attention mechanism.

SPEECH SEPARATION

Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation

Interspeech 2020 mpariente/asteroid

By introduces a improved transformer, elements in speech sequences can interact directly, which enables DPTNet can model for the speech sequences with direct context-awareness.

SPEECH SEPARATION AUDIO AND SPEECH PROCESSING SOUND

Sudo rm -rf: Efficient Networks for Universal Audio Source Separation

14 Jul 2020mpariente/asteroid

In this paper, we present an efficient neural network for end-to-end general purpose audio source separation.

AUDIO SOURCE SEPARATION SPEECH SEPARATION

Filterbank design for end-to-end speech separation

23 Oct 2019mpariente/AsSteroid

Also, we validate the use of parameterized filterbanks and show that complex-valued representations and masks are beneficial in all conditions.

SPEAKER RECOGNITION SPEECH SEPARATION

Two-Step Sound Source Separation: Training on Learned Latent Targets

22 Oct 2019mpariente/asteroid

In the first step we learn a transform (and it's inverse) to a latent space where masking-based separation performance using oracles is optimal.

SPEECH SEPARATION

Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation

14 Oct 2019mpariente/asteroid

Recent studies in deep learning-based speech separation have proven the superiority of time-domain approaches to conventional time-frequency-based methods.

SPEECH SEPARATION

Real-time Single-channel Dereverberation and Separation with Time-domainAudio Separation Network

ISCA Interspeech 2018 mpariente/asteroid

We investigate the recently proposed Time-domain Audio Sep-aration Network (TasNet) in the task of real-time single-channel speech dereverberation.

DENOISING SPEECH DEREVERBERATION SPEECH SEPARATION

Alternative Objective Functions for Deep Clustering

ICASSP 2018 mpariente/asteroid

The recently proposed deep clustering framework represents a significant step towards solv-ing the cocktail party problem.

DEEP CLUSTERING SPEECH SEPARATION

TasNet: time-domain audio separation network for real-time, single-channel speech separation

1 Nov 2017mpariente/asteroid

We directly model the signal in the time-domain using an encoder-decoder framework and perform the source separation on nonnegative encoder outputs.

SPEECH SEPARATION