LaSAFT: Latent Source Attentive Frequency Transformation for Conditioned Source Separation

22 Oct 2020  ·  Woosung Choi, Minseok Kim, Jaehwa Chung, Soonyoung Jung ·

Recent deep-learning approaches have shown that Frequency Transformation (FT) blocks can significantly improve spectrogram-based single-source separation models by capturing frequency patterns. The goal of this paper is to extend the FT block to fit the multi-source task. We propose the Latent Source Attentive Frequency Transformation (LaSAFT) block to capture source-dependent frequency patterns. We also propose the Gated Point-wise Convolutional Modulation (GPoCM), an extension of Feature-wise Linear Modulation (FiLM), to modulate internal features. By employing these two novel methods, we extend the Conditioned-U-Net (CUNet) for multi-source separation, and the experimental results indicate that our LaSAFT and GPoCM can improve the CUNet's performance, achieving state-of-the-art SDR performance on several MUSDB18 source separation tasks.

PDF Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Music Source Separation MUSDB18 LaSAFT+GPoCM SDR (vocals) 7.33 # 12
SDR (drums) 5.68 # 25
SDR (other) 4.87 # 13
SDR (bass) 5.63 # 18
SDR (avg) 5.88 # 20

Methods


No methods listed for this paper. Add relevant methods here