Target Sound Extraction

5 papers with code • 3 benchmarks • 3 datasets

Target Sound Extraction is the task of extracting a sound corresponding to a given class from an audio mixture. The audio mixture may contain background noise with a relatively low amplitude compared to the foreground mixture components. The choice of the sound class is provided as input to the model in form of a string, integer, or a one-hot encoding of the sound class.

Most implemented papers

Real-Time Target Sound Extraction

vb000/waveformer 4 Nov 2022

We present the first neural network model to achieve real-time and streaming target sound extraction.

Target Sound Extraction with Variable Cross-modality Clues

lichenda/multi-clue-tse-data 15 Mar 2023

Automatic target sound extraction (TSE) is a machine learning approach to mimic the human auditory perception capability of attending to a sound source of interest from a mixture of sources.

DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction


Common target sound extraction (TSE) approaches primarily relied on discriminative approaches in order to separate the target sound while minimizing interference from the unwanted sources, with varying success in separating the target from the background.

Semantic Hearing: Programming Acoustic Scenes with Binaural Hearables

vb000/SemanticHearing 1 Nov 2023

To achieve this, we make two technical contributions: 1) we present the first neural network that can achieve binaural target sound extraction in the presence of interfering sounds and background noise, and 2) we design a training methodology that allows our system to generalize to real-world use.

CLAPSep: Leveraging Contrastive Pre-trained Model for Multi-Modal Query-Conditioned Target Sound Extraction

aisaka0v0/clapsep 27 Feb 2024

Universal sound separation (USS) aims to extract arbitrary types of sounds from real-world recordings.