Audio Source Separation
48 papers with code • 2 benchmarks • 14 datasets
Audio Source Separation is the process of separating a mixture (e.g. a pop band recording) into isolated sounds from individual sources (e.g. just the lead vocals).
Source: Model selection for deep audio source separation via clustering analysis
Most implemented papers
Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation
Models for audio source separation usually operate on the magnitude spectrum, which ignores phase information and makes separation performance dependant on hyper-parameters for the spectral front-end.
Multi-scale Multi-band DenseNets for Audio Source Separation
This paper deals with the problem of audio source separation.
Sudo rm -rf: Efficient Networks for Universal Audio Source Separation
In this paper, we present an efficient neural network for end-to-end general purpose audio source separation.
Adversarial Semi-Supervised Audio Source Separation applied to Singing Voice Extraction
Based on this idea, we drive the separator towards outputs deemed as realistic by discriminator networks that are trained to tell apart real from separator samples.
Improved Speech Enhancement with the Wave-U-Net
We study the use of the Wave-U-Net architecture for speech enhancement, a model introduced by Stoller et al for the separation of music vocals and accompaniment.
Co-Separating Sounds of Visual Objects
Learning how objects sound from video is challenging, since they often heavily overlap in a single audio channel.
Compute and memory efficient universal sound source separation
Recent progress in audio source separation lead by deep learning has enabled many neural network models to provide robust solutions to this fundamental estimation problem.
The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks
The cocktail party problem aims at isolating any source of interest within a complex acoustic scene, and has long inspired audio source separation research.
Learning to Separate Object Sounds by Watching Unlabeled Video
Our work is the first to learn audio source separation from large-scale "in the wild" videos containing multiple audio sources per video.
Conditioned-U-Net: Introducing a Control Mechanism in the U-Net for Multiple Source Separations
The input vector is embedded to obtain the parameters that control Feature-wise Linear Modulation (FiLM) layers.