TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech Separation	WSJ0-2mix	SepFormer	SI-SDRi	22.3	# 8
Speech Separation	WSJ0-2mix	SepFormer	SDRi	22.4	# 1
Speech Separation	WSJ0-3mix	SepFormer	SI-SDRi	19.5	# 7

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/attention-is-all-you-need-in-speech/speech-separation-on-wsj0-3mix)](https://paperswithcode.com/sota/speech-separation-on-wsj0-3mix?p=attention-is-all-you-need-in-speech)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/attention-is-all-you-need-in-speech/speech-separation-on-wsj0-2mix)](https://paperswithcode.com/sota/speech-separation-on-wsj0-2mix?p=attention-is-all-you-need-in-speech)`

Attention is All You Need in Speech Separation

25 Oct 2020 · Cem Subakan, Mirco Ravanelli, Samuele Cornell, Mirko Bronzi, Jianyuan Zhong ·

Recurrent Neural Networks (RNNs) have long been the dominant architecture in sequence-to-sequence learning. RNNs, however, are inherently sequential models that do not allow parallelization of their computations. Transformers are emerging as a natural alternative to standard RNNs, replacing recurrent computations with a multi-head attention mechanism. In this paper, we propose the SepFormer, a novel RNN-free Transformer-based neural network for speech separation. The SepFormer learns short and long-term dependencies with a multi-scale approach that employs transformers. The proposed model achieves state-of-the-art (SOTA) performance on the standard WSJ0-2/3mix datasets. It reaches an SI-SNRi of 22.3 dB on WSJ0-2mix and an SI-SNRi of 19.5 dB on WSJ0-3mix. The SepFormer inherits the parallelization advantages of Transformers and achieves a competitive performance even when downsampling the encoded representation by a factor of 8. It is thus significantly faster and it is less memory-demanding than the latest speech separation systems with comparable performance.

PDF Abstract

Code

Add Remove Mark official

speechbrain/speechbrain official

7,879

SungFeng-Huang/SSL-pretraining-sepa…

Zhongyang-debug/Attention-Is-All-Yo…

2024-MindSpore-1/Code3

Tasks

Add Remove

Speech Separation

Datasets

WSJ0-2mix

Results from the Paper

Edit

Ranked #7 on Speech Separation on WSJ0-3mix

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Speech Separation	WSJ0-2mix	SepFormer	SI-SDRi	22.3	# 8	Compare
Speech Separation	WSJ0-2mix	SepFormer	SDRi	22.4	# 1	Compare
Speech Separation	WSJ0-3mix	SepFormer	SI-SDRi	19.5	# 7	Compare

Methods

Add Remove

Dense Connections • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • PReLU • ReLU • Residual Connection • Scaled Dot-Product Attention • SepFormer • Softmax

Edit Social Preview

Attention is All You Need in Speech Separation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove