To address this issue, we propose a novel method to learn source-awarelatent representations of music through Vector-Quantized Variational Auto-Encoder(VQ-VAE). We train our VQ-VAE to encode an input mixture into a tensor of integers in a discrete latentspace, and design them to have a decomposed structure which allows humans to manipulatethe latent vector in a source-aware manner.
Conditioned source separations have attracted significant attention because of their flexibility, applicability and extensionality.
This paper proposes a two-stream neural network for music demixing, called KUIELab-MDX-Net, which shows a good balance of performance and required resources.
Ranked #7 on Music Source Separation on MUSDB18
This paper proposes a neural network that performs audio transformations to user-specified sources (e. g., vocals) of a given audio track according to a given description while preserving other sources not mentioned in the description.
Recent deep-learning approaches have shown that Frequency Transformation (FT) blocks can significantly improve spectrogram-based single-source separation models by capturing frequency patterns.
Ranked #19 on Music Source Separation on MUSDB18
Singing Voice Separation (SVS) tries to separate singing voice from a given mixed musical signal.