TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech Enhancement	WHAM!	SepFormer	PESQ	3.07	# 1
Speech Enhancement	WHAM!	SepFormer	SDR	15.04	# 1
Speech Enhancement	WHAM!	SepFormer	SI-SNR	14.35	# 1
Speech Enhancement	WHAMR!	SepFormer	PESQ	2.84	# 1
Speech Enhancement	WHAMR!	SepFormer	SI-SNR	10.58	# 1
Speech Enhancement	WHAMR!	SepFormer	SDR	12.29	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/on-using-transformers-for-speech-separation/speech-enhancement-on-wham)](https://paperswithcode.com/sota/speech-enhancement-on-wham?p=on-using-transformers-for-speech-separation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/on-using-transformers-for-speech-separation/speech-enhancement-on-whamr)](https://paperswithcode.com/sota/speech-enhancement-on-whamr?p=on-using-transformers-for-speech-separation)`

Exploring Self-Attention Mechanisms for Speech Separation

6 Feb 2022 · Cem Subakan, Mirco Ravanelli, Samuele Cornell, Francois Grondin, Mirko Bronzi ·

Transformers have enabled impressive improvements in deep learning. They often outperform recurrent and convolutional models in many tasks while taking advantage of parallel processing. Recently, we proposed the SepFormer, which obtains state-of-the-art performance in speech separation with the WSJ0-2/3 Mix datasets. This paper studies in-depth Transformers for speech separation. In particular, we extend our previous findings on the SepFormer by providing results on more challenging noisy and noisy-reverberant datasets, such as LibriMix, WHAM!, and WHAMR!. Moreover, we extend our model to perform speech enhancement and provide experimental evidence on denoising and dereverberation tasks. Finally, we investigate, for the first time in speech separation, the use of efficient self-attention mechanisms such as Linformers, Lonformers, and ReFormers. We found that they reduce memory requirements significantly. For example, we show that the Reformer-based attention outperforms the popular Conv-TasNet model on the WSJ0-2Mix dataset while being faster at inference and comparable in terms of memory consumption.

PDF Abstract

Code

Add Remove Mark official

speechbrain/speechbrain official

↳ Quickstart in

Colab

7,879

Tasks

Add Remove

Denoising

Speech Enhancement

Speech Separation

Datasets

WSJ0-2mix WHAM! LibriMix WHAMR!

Results from the Paper

Edit

Ranked #1 on Speech Enhancement on WHAM!

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Speech Enhancement	WHAM!	SepFormer	PESQ	3.07	# 1	Compare
			SDR	15.04	# 1	Compare
			SI-SNR	14.35	# 1	Compare
Speech Enhancement	WHAMR!	SepFormer	PESQ	2.84	# 1	Compare
			SI-SNR	10.58	# 1	Compare
			SDR	12.29	# 1	Compare

Methods

Add Remove

ConvTasNet • Dense Connections • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • PReLU • ReLU • Residual Connection • Scaled Dot-Product Attention • SepFormer • Softmax

Edit Social Preview

Exploring Self-Attention Mechanisms for Speech Separation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove