Speech Separation

96 papers with code • 18 benchmarks • 16 datasets

The task of extracting all overlapping speech sources in a given mixed speech signal refers to the Speech Separation. Speech Separation is a special scenario of source separation problem, where the focus is only on the overlapping speech signal sources and other interferences such as music or noise signals are not the main concern of the study.

Source: A Unified Framework for Speech Separation

Image credit: Speech Separation of A Target Speaker Based on Deep Neural Networks

Libraries

Use these libraries to find Speech Separation models and implementations
10 papers
2,096
3 papers
234
2 papers
7,858
See all 6 libraries.

Latest papers with no code

Robust Active Speaker Detection in Noisy Environments

no code yet • 27 Mar 2024

Experiments demonstrate that non-speech audio noises significantly impact ASD models, and our proposed approach improves ASD performance in noisy environments.

PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings

no code yet • 4 Mar 2024

A major drawback of supervised speech separation (SSep) systems is their reliance on synthetic data, leading to poor real-world generalization.

Probing Self-supervised Learning Models with Target Speech Extraction

no code yet • 17 Feb 2024

TSE uniquely requires both speaker identification and speech separation, distinguishing it from other tasks in the Speech processing Universal PERformance Benchmark (SUPERB) evaluation.

Mixture to Mixture: Leveraging Close-talk Mixtures as Weak-supervision for Speech Separation

no code yet • 14 Feb 2024

We propose mixture to mixture (M2M) training, a weakly-supervised neural speech separation algorithm that leverages close-talk mixtures as a weak supervision for training discriminative models to separate far-field mixtures.

Boosting Unknown-number Speaker Separation with Transformer Decoder-based Attractor

no code yet • 23 Jan 2024

We propose a novel speech separation model designed to separate mixtures with an unknown number of speakers.

Resource-constrained stereo singing voice cancellation

no code yet • 22 Jan 2024

We study the problem of stereo singing voice cancellation, a subtask of music source separation, whose goal is to estimate an instrumental background from a stereo mix.

Multi-Input Multi-Output Target-Speaker Voice Activity Detection For Unified, Flexible, and Robust Audio-Visual Speaker Diarization

no code yet • 16 Jan 2024

The proposed method can take audio-visual input and leverage the speaker's acoustic footprint or lip track to flexibly conduct audio-based, video-based, and audio-visual speaker diarization in a unified sequence-to-sequence framework.

Hyperbolic Distance-Based Speech Separation

no code yet • 7 Jan 2024

In this work, we explore the task of hierarchical distance-based speech separation defined on a hyperbolic manifold.

Single-Microphone Speaker Separation and Voice Activity Detection in Noisy and Reverberant Environments

no code yet • 7 Jan 2024

Speech separation involves extracting an individual speaker's voice from a multi-speaker audio signal.

Improving Label Assignments Learning by Dynamic Sample Dropout Combined with Layer-wise Optimization in Speech Separation

no code yet • 20 Nov 2023

Despite its success, previous studies showed that PIT is plagued by excessive label assignment switching in adjacent epochs, impeding the model to learn better label assignments.