Audio Source Separation
44 papers with code • 2 benchmarks • 14 datasets
Audio Source Separation is the process of separating a mixture (e.g. a pop band recording) into isolated sounds from individual sources (e.g. just the lead vocals).
Source: Model selection for deep audio source separation via clustering analysis
Latest papers
A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation
Cinematic audio source separation is a relatively new subtask of audio source separation, with the aim of extracting the dialogue, music, and effects stems from their mixture.
Separate Anything You Describe
In this work, we introduce AudioSep, a foundation model for open-domain audio source separation with natural language queries.
Deep Audio Waveform Prior
A network with relevant deep priors is likely to generate a cleaner version of the signal before converging on the corrupted signal.
Separate What You Describe: Language-Queried Audio Source Separation
In this paper, we introduce the task of language-queried audio source separation (LASS), which aims to separate a target source from an audio mixture based on a natural language query of the target source (e. g., "a man tells a joke followed by people laughing").
Unsupervised Music Source Separation Using Differentiable Parametric Source Models
Integrating domain knowledge in the form of source models into a data-driven method leads to high data efficiency: the proposed approach achieves good separation quality even when trained on less than three minutes of audio.
Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data
Our approach uses a single model for source separation of multiple sound types, and relies solely on weakly-labeled data for training.
Hybrid Neural Networks for On-device Directional Hearing
On-device directional hearing requires audio source separation from a given direction while achieving stringent human-imperceptible latency requirements.
Transfer Learning with Jukebox for Music Source Separation
In this work, we demonstrate how a publicly available, pre-trained Jukebox model can be adapted for the problem of audio source separation from a single mixed audio channel.
Unsupervised Source Separation By Steering Pretrained Music Models
We showcase an unsupervised method that repurposes deep models trained for music generation and music tagging for audio source separation, without any retraining.
The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks
The cocktail party problem aims at isolating any source of interest within a complex acoustic scene, and has long inspired audio source separation research.