TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Streaming Target Sound Extraction	FSDSoundScapes	Waveformer	SI-SNRi	9.43	# 1
Target Sound Extraction	FSDSoundScapes	Waveformer	SI-SNRi	9.43	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/real-time-target-sound-extraction/streaming-target-sound-extraction-on)](https://paperswithcode.com/sota/streaming-target-sound-extraction-on?p=real-time-target-sound-extraction)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/real-time-target-sound-extraction/target-sound-extraction-on-fsdsoundscapes)](https://paperswithcode.com/sota/target-sound-extraction-on-fsdsoundscapes?p=real-time-target-sound-extraction)`

Real-Time Target Sound Extraction

4 Nov 2022 · Bandhav Veluri, Justin Chan, Malek Itani, Tuochao Chen, Takuya Yoshioka, Shyamnath Gollakota ·

We present the first neural network model to achieve real-time and streaming target sound extraction. To accomplish this, we propose Waveformer, an encoder-decoder architecture with a stack of dilated causal convolution layers as the encoder, and a transformer decoder layer as the decoder. This hybrid architecture uses dilated causal convolutions for processing large receptive fields in a computationally efficient manner while also leveraging the generalization performance of transformer-based architectures. Our evaluations show as much as 2.2-3.3 dB improvement in SI-SNRi compared to the prior models for this task while having a 1.2-4x smaller model size and a 1.5-2x lower runtime. We provide code, dataset, and audio samples: https://waveformer.cs.washington.edu/.

PDF Abstract