Speech Separation

96 papers with code • 18 benchmarks • 16 datasets

The task of extracting all overlapping speech sources in a given mixed speech signal refers to the Speech Separation. Speech Separation is a special scenario of source separation problem, where the focus is only on the overlapping speech signal sources and other interferences such as music or noise signals are not the main concern of the study.

Source: A Unified Framework for Speech Separation

Image credit: Speech Separation of A Target Speaker Based on Deep Neural Networks

Benchmarks

Add a Result

These leaderboards are used to track progress in Speech Separation

Dataset	Best Model	Compare
WSJ0-2mix	MossFormer2	See all
WHAMR!	MossFormer2	See all
WSJ0-3mix	SepTDA	See all
Libri2Mix	MossFormer2 (w speed perturb)	See all
LRS2	TDFNet-small	See all
WSJ0-5mix	SepTDA	See all
WSJ0-4mix	SepTDA	See all
Libri5Mix	Separate And Diffuse	See all
WHAM!	MossFormer2	See all
LRS3	RTFS-Net-4	See all
VoxCeleb2	RTFS-Net-4	See all
Libri10Mix	Separate And Diffuse	See all
Libri20Mix	Separate And Diffuse	See all
LibriCSS	Conformer (large)	See all
GRID corpus (mixed-speech)	Audio-Visual concat-ref	See all
TCD-TIMIT corpus (mixed-speech)	Audio-Visual concat-ref	See all
Libri15Mix	Hungarian PIT	See all
iKala	U-Net	See all

Show all 18 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Speech Separation models and implementations

mpariente/asteroid

10 papers

2,096

yluo42/TAC

3 papers

234

espnet/espnet

2 papers

7,851

speechbrain/speechbrain

2 papers

7,844

See all 6 libraries.

Datasets

Subtasks

Speech Extraction

Latest papers

Most implemented Social Latest No code

SPMamba: State-space model is all you need in speech separation

jusperlee/spmamba • • 2 Apr 2024

Notably, within computer vision, Mamba-based methods have been celebrated for their formidable performance and reduced computational requirements.

02 Apr 2024

Paper
Code

Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation

xi-j/Mamba-TasNet • • 27 Mar 2024

In this work, we replace transformers with Mamba, a selective state space model, for speech separation.

27 Mar 2024

Paper
Code

Online speaker diarization of meetings guided by speech separation

egruttadauria98/sspavaldo • • 30 Jan 2024

The results show that our system improves the state-of-the-art on the AMI headset mix, using no oracle information and under full evaluation (no collar and including overlapped speech).

30 Jan 2024

Paper
Code

TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion

spkgyk/TDFNet • • 25 Jan 2024

TDANet serves as the architectural foundation for the auditory and visual networks within TDFNet, offering an efficient model with fewer parameters.

25 Jan 2024

Paper
Code

On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments

jwr1995/pubsep • • 9 Oct 2023

Convolution augmented transformers (conformers) have performed well for many speech processing tasks but have been under-researched for speech separation.

09 Oct 2023

Paper
Code

RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual Speech Separation

spkgyk/RTFS-Net • • 29 Sep 2023

This is the first time-frequency domain audio-visual speech separation method to outperform all contemporary time-domain counterparts.

29 Sep 2023

Paper
Code

SPGM: Prioritizing Local Features for enhanced speech separation performance

yipjiaqi/spgm • 22 Sep 2023

Dual-path is a popular architecture for speech separation models (e. g. Sepformer) which splits long sequences into overlapping chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships.

22 Sep 2023

Paper
Code

Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model

hmartelb/avlit • • 31 May 2023

We propose Audio-Visual Lightweight ITerative model (AVLIT), an effective and lightweight neural network that uses Progressive Learning (PL) to perform audio-visual speech separation in noisy environments.

31 May 2023

Paper
Code

A Neural State-Space Model Approach to Efficient Speech Separation

JusperLee/S4M • • 26 May 2023

In this work, we introduce S4M, a new efficient speech separation framework based on neural state-space models (SSM).

26 May 2023

Paper
Code

MossFormer: Pushing the Performance Limit of Monaural Speech Separation using Gated Single-Head Transformer with Convolution-Augmented Joint Self-Attentions

alibabasglab/mossformer • • 23 Feb 2023

To effectively solve the indirect elemental interactions across chunks in the dual-path architecture, MossFormer employs a joint local and global self-attention architecture that simultaneously performs a full-computation self-attention on local chunks and a linearised low-cost self-attention over the full sequence.

23 Feb 2023

Paper
Code

Speech Separation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result