Speech Separation
94 papers with code • 18 benchmarks • 16 datasets
The task of extracting all overlapping speech sources in a given mixed speech signal refers to the Speech Separation. Speech Separation is a special scenario of source separation problem, where the focus is only on the overlapping speech signal sources and other interferences such as music or noise signals are not the main concern of the study.
Source: A Unified Framework for Speech Separation
Image credit: Speech Separation of A Target Speaker Based on Deep Neural Networks
Libraries
Use these libraries to find Speech Separation models and implementationsMost implemented papers
Compute and memory efficient universal sound source separation
Recent progress in audio source separation lead by deep learning has enabled many neural network models to provide robust solutions to this fundamental estimation problem.
Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation
In this paper, we explore joint optimization of masking functions and deep recurrent neural networks for monaural source separation tasks, including monaural speech separation, monaural singing voice separation, and speech denoising.
Single-Channel Multi-Speaker Separation using Deep Clustering
In this paper we extend the baseline system with an end-to-end signal approximation objective that greatly improves performance on a challenging speech separation.
Two-Step Sound Source Separation: Training on Learned Latent Targets
In the first step we learn a transform (and it's inverse) to a latent space where masking-based separation performance using oracles is optimal.
Filterbank design for end-to-end speech separation
Also, we validate the use of parameterized filterbanks and show that complex-valued representations and masks are beneficial in all conditions.
End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation
An important problem in ad-hoc microphone speech separation is how to guarantee the robustness of a system with respect to the locations and numbers of microphones.
Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation
Although our system is trained on simulated room impulse responses (RIR) based on a fixed number of microphones arranged in a given geometry, it generalizes well to a real array with the same geometry.
Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks
Recent research on the time-domain audio separation networks (TasNets) has brought great success to speech separation.
Directional Sparse Filtering using Weighted Lehmer Mean for Blind Separation of Unbalanced Speech Mixtures
In blind source separation of speech signals, the inherent imbalance in the source spectrum poses a challenge for methods that rely on single-source dominance for the estimation of the mixing matrix.
Sandglasset: A Light Multi-Granularity Self-attentive Network For Time-Domain Speech Separation
One of the leading single-channel speech separation (SS) models is based on a TasNet with a dual-path segmentation technique, where the size of each segment remains unchanged throughout all layers.