Speech Separation

96 papers with code • 18 benchmarks • 16 datasets

The task of extracting all overlapping speech sources in a given mixed speech signal refers to the Speech Separation. Speech Separation is a special scenario of source separation problem, where the focus is only on the overlapping speech signal sources and other interferences such as music or noise signals are not the main concern of the study.

Source: A Unified Framework for Speech Separation

Image credit: Speech Separation of A Target Speaker Based on Deep Neural Networks

Libraries

Use these libraries to find Speech Separation models and implementations
10 papers
2,096
3 papers
234
2 papers
7,851
See all 6 libraries.

SPMamba: State-space model is all you need in speech separation

jusperlee/spmamba 2 Apr 2024

Notably, within computer vision, Mamba-based methods have been celebrated for their formidable performance and reduced computational requirements.

57
02 Apr 2024

Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation

xi-j/Mamba-TasNet 27 Mar 2024

In this work, we replace transformers with Mamba, a selective state space model, for speech separation.

4
27 Mar 2024

Online speaker diarization of meetings guided by speech separation

egruttadauria98/sspavaldo 30 Jan 2024

The results show that our system improves the state-of-the-art on the AMI headset mix, using no oracle information and under full evaluation (no collar and including overlapped speech).

16
30 Jan 2024

TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion

spkgyk/TDFNet 25 Jan 2024

TDANet serves as the architectural foundation for the auditory and visual networks within TDFNet, offering an efficient model with fewer parameters.

3
25 Jan 2024

On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments

jwr1995/pubsep 9 Oct 2023

Convolution augmented transformers (conformers) have performed well for many speech processing tasks but have been under-researched for speech separation.

9
09 Oct 2023

RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual Speech Separation

spkgyk/RTFS-Net 29 Sep 2023

This is the first time-frequency domain audio-visual speech separation method to outperform all contemporary time-domain counterparts.

21
29 Sep 2023

SPGM: Prioritizing Local Features for enhanced speech separation performance

yipjiaqi/spgm 22 Sep 2023

Dual-path is a popular architecture for speech separation models (e. g. Sepformer) which splits long sequences into overlapping chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships.

0
22 Sep 2023

Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model

hmartelb/avlit 31 May 2023

We propose Audio-Visual Lightweight ITerative model (AVLIT), an effective and lightweight neural network that uses Progressive Learning (PL) to perform audio-visual speech separation in noisy environments.

17
31 May 2023

A Neural State-Space Model Approach to Efficient Speech Separation

JusperLee/S4M 26 May 2023

In this work, we introduce S4M, a new efficient speech separation framework based on neural state-space models (SSM).

16
26 May 2023

MossFormer: Pushing the Performance Limit of Monaural Speech Separation using Gated Single-Head Transformer with Convolution-Augmented Joint Self-Attentions

alibabasglab/mossformer 23 Feb 2023

To effectively solve the indirect elemental interactions across chunks in the dual-path architecture, MossFormer employs a joint local and global self-attention architecture that simultaneously performs a full-computation self-attention on local chunks and a linearised low-cost self-attention over the full sequence.

65
23 Feb 2023