TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech Separation	Libri2Mix	TDANet Large	SI-SDRi	17.4	# 4
Speech Separation	Libri2Mix	TDANet	SI-SDRi	16.9	# 5
Speech Separation	WHAM!	TDANet Large	SI-SDRi	15.2	# 3
Speech Separation	WHAM!	TDANet	SI-SDRi	14.8	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/an-efficient-encoder-decoder-architecture/speech-separation-on-wham)](https://paperswithcode.com/sota/speech-separation-on-wham?p=an-efficient-encoder-decoder-architecture)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/an-efficient-encoder-decoder-architecture/speech-separation-on-libri2mix)](https://paperswithcode.com/sota/speech-separation-on-libri2mix?p=an-efficient-encoder-decoder-architecture)`

An efficient encoder-decoder architecture with top-down attention for speech separation

30 Sep 2022 · Kai Li, Runxuan Yang, Xiaolin Hu ·

Deep neural networks have shown excellent prospects in speech separation tasks. However, obtaining good results while keeping a low model complexity remains challenging in real-world applications. In this paper, we provide a bio-inspired efficient encoder-decoder architecture by mimicking the brain's top-down attention, called TDANet, with decreased model complexity without sacrificing performance. The top-down attention in TDANet is extracted by the global attention (GA) module and the cascaded local attention (LA) layers. The GA module takes multi-scale acoustic features as input to extract global attention signal, which then modulates features of different scales by direct top-down connections. The LA layers use features of adjacent layers as input to extract the local attention signal, which is used to modulate the lateral input in a top-down manner. On three benchmark datasets, TDANet consistently achieved competitive separation performance to previous state-of-the-art (SOTA) methods with higher efficiency. Specifically, TDANet's multiply-accumulate operations (MACs) are only 5\% of Sepformer, one of the previous SOTA models, and CPU inference time is only 10\% of Sepformer. In addition, a large-size version of TDANet obtained SOTA results on three datasets, with MACs still only 10\% of Sepformer and the CPU inference time only 24\% of Sepformer.

PDF Abstract

Code

Add Remove Mark official

JusperLee/TDANet official

205

Tasks

Add Remove

Speech Separation

Datasets

WHAM! LibriMix

Results from the Paper

Edit

Ranked #3 on Speech Separation on WHAM!

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Speech Separation	Libri2Mix	TDANet Large	SI-SDRi	17.4	# 4	Compare
Speech Separation	Libri2Mix	TDANet	SI-SDRi	16.9	# 5	Compare
Speech Separation	WHAM!	TDANet Large	SI-SDRi	15.2	# 3	Compare
Speech Separation	WHAM!	TDANet	SI-SDRi	14.8	# 4	Compare

Edit Social Preview

An efficient encoder-decoder architecture with top-down attention for speech separation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove