Sams-Net: A Sliced Attention-based Neural Network for Music Source Separation

12 Sep 2019  ·  Tingle Li, Jia-Wei Chen, Haowen Hou, Ming Li ·

Convolutional Neural Network (CNN) or Long short-term memory (LSTM) based models with the input of spectrogram or waveforms are commonly used for deep learning based audio source separation. In this paper, we propose a Sliced Attention-based neural network (Sams-Net) in the spectrogram domain for the music source separation task. It enables spectral feature interactions with multi-head attention mechanism, achieves easier parallel computing and has a larger receptive field compared with LSTMs and CNNs respectively. Experimental results on the MUSDB18 dataset show that the proposed method, with fewer parameters, outperforms most of the state-of-the-art DNN-based methods.

PDF Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Music Source Separation MUSDB18 Sams-Net SDR (vocals) 6.61 # 22
SDR (drums) 6.63 # 19
SDR (other) 4.09 # 22
SDR (bass) 5.25 # 23
SDR (avg) 5.65 # 23

Methods