TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech Separation	Libri10Mix	Separate And Diffuse	SI-SDRi	9	# 1
Speech Separation	Libri20Mix	Separate And Diffuse	SI-SDRi	5.2	# 1
Speech Separation	Libri2Mix	Separate And Diffuse	SI-SDRi	21.5	# 3
Speech Separation	Libri5Mix	Separate And Diffuse	SI-SDRi	14.2	# 1
Speech Separation	WSJ0-2mix	Separate And Diffuse	SI-SDRi	23.9	# 3
Speech Separation	WSJ0-3mix	Separate And Diffuse	SI-SDRi	20.9	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/separate-and-diffuse-using-a-pretrained/speech-separation-on-libri10mix)](https://paperswithcode.com/sota/speech-separation-on-libri10mix?p=separate-and-diffuse-using-a-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/separate-and-diffuse-using-a-pretrained/speech-separation-on-libri20mix)](https://paperswithcode.com/sota/speech-separation-on-libri20mix?p=separate-and-diffuse-using-a-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/separate-and-diffuse-using-a-pretrained/speech-separation-on-libri5mix)](https://paperswithcode.com/sota/speech-separation-on-libri5mix?p=separate-and-diffuse-using-a-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/separate-and-diffuse-using-a-pretrained/speech-separation-on-libri2mix)](https://paperswithcode.com/sota/speech-separation-on-libri2mix?p=separate-and-diffuse-using-a-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/separate-and-diffuse-using-a-pretrained/speech-separation-on-wsj0-2mix)](https://paperswithcode.com/sota/speech-separation-on-wsj0-2mix?p=separate-and-diffuse-using-a-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/separate-and-diffuse-using-a-pretrained/speech-separation-on-wsj0-3mix)](https://paperswithcode.com/sota/speech-separation-on-wsj0-3mix?p=separate-and-diffuse-using-a-pretrained)`

Separate And Diffuse: Using a Pretrained Diffusion Model for Improving Source Separation

25 Jan 2023 · Shahar Lutati, Eliya Nachmani, Lior Wolf ·

The problem of speech separation, also known as the cocktail party problem, refers to the task of isolating a single speech signal from a mixture of speech signals. Previous work on source separation derived an upper bound for the source separation task in the domain of human speech. This bound is derived for deterministic models. Recent advancements in generative models challenge this bound. We show how the upper bound can be generalized to the case of random generative models. Applying a diffusion model Vocoder that was pretrained to model single-speaker voices on the output of a deterministic separation model leads to state-of-the-art separation results. It is shown that this requires one to combine the output of the separation model with that of the diffusion model. In our method, a linear combination is performed, in the frequency domain, using weights that are inferred by a learned model. We show state-of-the-art results on 2, 3, 5, 10, and 20 speakers on multiple benchmarks. In particular, for two speakers, our method is able to surpass what was previously considered the upper performance bound.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Audio Source Separation

Generalization Bounds

Multi-Speaker Source Separation

Speech Separation

Datasets

LibriSpeech

WSJ0-2mix LibriMix

Results from the Paper

Edit

Ranked #1 on Speech Separation on Libri20Mix

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Speech Separation	Libri10Mix	Separate And Diffuse	SI-SDRi	9	# 1	Compare
Speech Separation	Libri20Mix	Separate And Diffuse	SI-SDRi	5.2	# 1	Compare
Speech Separation	Libri2Mix	Separate And Diffuse	SI-SDRi	21.5	# 3	Compare
Speech Separation	Libri5Mix	Separate And Diffuse	SI-SDRi	14.2	# 1	Compare
Speech Separation	WSJ0-2mix	Separate And Diffuse	SI-SDRi	23.9	# 3	Compare
Speech Separation	WSJ0-3mix	Separate And Diffuse	SI-SDRi	20.9	# 4	Compare

Methods

Add Remove

Dense Connections • Diffusion • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • PReLU • ReLU • Residual Connection • Scaled Dot-Product Attention • Separate And Diffuse • SepFormer • Softmax

Edit Social Preview

Separate And Diffuse: Using a Pretrained Diffusion Model for Improving Source Separation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove