Audio Source Separation
44 papers with code • 2 benchmarks • 14 datasets
Audio Source Separation is the process of separating a mixture (e.g. a pop band recording) into isolated sounds from individual sources (e.g. just the lead vocals).
Source: Model selection for deep audio source separation via clustering analysis
Latest papers with no code
Gull: A Generative Multifunctional Audio Codec
We introduce Gull, a generative multifunctional audio codec.
Mixture of Dynamical Variational Autoencoders for Multi-Source Trajectory Modeling and Separation
In this paper, we propose a latent-variable generative model called mixture of dynamical variational autoencoders (MixDVAE) to model the dynamics of a system composed of multiple moving sources.
GASS: Generalizing Audio Source Separation with Large-scale Data
Here, we study a single general audio source separation (GASS) model trained to separate speech, music, and sound events in a supervised fashion with a large-scale dataset.
Language-Guided Audio-Visual Source Separation via Trimodal Consistency
We propose a self-supervised approach for learning to perform audio source separation in videos based on natural language queries, using only unlabeled video and audio pairs as training data.
Separate And Diffuse: Using a Pretrained Diffusion Model for Improving Source Separation
Applying a diffusion model Vocoder that was pretrained to model single-speaker voices on the output of a deterministic separation model leads to state-of-the-art separation results.
Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks
In this paper, we focus on the cocktail fork problem, which takes a three-pronged approach to source separation by separating an audio mixture such as a movie soundtrack or podcast into the three broad categories of speech, music, and sound effects (SFX - understood to include ambient noise and natural sound events).
Hyperbolic Audio Source Separation
We introduce a framework for audio source separation using embeddings on a hyperbolic manifold that compactly represent the hierarchical relationship between sound sources and time-frequency features.
Differentiable Dictionary Search: Integrating Linear Mixing with Deep Non-Linear Modelling for Audio Source Separation
This paper describes several improvements to a new method for signal decomposition that we recently formulated under the name of Differentiable Dictionary Search (DDS).
Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation
In this paper, we propose to use this connection between audio and visual dynamics for solving two challenging tasks simultaneously, namely: (i) separating audio sources from a mixture using visual cues, and (ii) predicting the 3D visual motion of a sounding source using its separated audio.
Hierarchic Temporal Convolutional Network With Cross-Domain Encoder for Music Source Separation
In this paper, we propose a model which combines the complexed spectrogram domain feature and time-domain feature by a cross-domain encoder (CDE) and adopts the hierarchic temporal convolutional network (HTCN) for multiple music sources separation.