This paper proposes several improvements for music separation with deep neural networks (DNNs), namely a multi-domain loss (MDL) and two combination schemes.
Ranked #21 on Music Source Separation on MUSDB18
We present and release a new tool for music source separation with pre-trained models called Spleeter. Spleeter was designed with ease of use, separation performance and speed in mind.
Ranked #19 on Music Source Separation on MUSDB18 (using extra training data)
This approach has several limitations: 1) its incorrect phase reconstruction degrades the performance, 2) it limits the magnitude of masks between 0 and 1 while we observe that 22% of time-frequency bins have ideal ratio mask values of over~1 in a popular dataset, MUSDB18, 3) its potential on very deep architectures is under-explored.
Sound Audio and Speech Processing
We modify the target network, i. e., the network architecture of the original DNN-based MSS, by adding bridging paths for each output instrument to share their information.
In this paper, we claim the importance of a dense simultaneous modeling of multiresolution representation and propose a novel CNN architecture called densely connected multidilated DenseNet (D3Net).
Ranked #47 on Semantic Segmentation on Cityscapes test
In this paper, we claim the importance of a dense simultaneous modeling of multiresolution representation and propose a novel CNN architecture called densely connected multidilated DenseNet (D3Net).
In this paper, we claim the importance of a rapid growth of a receptive field and a simultaneous modeling of multi-resolution data in a single convolution layer, and propose a novel CNN architecture called densely connected dilated DenseNet (D3Net).
Ranked #12 on Music Source Separation on MUSDB18 (using extra training data)
Recent deep-learning approaches have shown that Frequency Transformation (FT) blocks can significantly improve spectrogram-based single-source separation models by capturing frequency patterns.
Ranked #20 on Music Source Separation on MUSDB18
To use large-scale weakly labeled/unlabeled audio data for audio source separation, we propose a universal audio source separation framework containing: 1) an audio tagging model trained on weakly labeled data as a query net; and 2) a conditional source separation model that takes query net outputs as conditions to separate arbitrary sound sources.
Sound Audio and Speech Processing
Our approach uses a single model for source separation of multiple sound types, and relies solely on weakly-labeled data for training.
Ranked #1 on Audio Source Separation on AudioSet