Target Sound Extraction

4 papers with code • 3 benchmarks • 3 datasets

Target Sound Extraction is the task of extracting a sound corresponding to a given class from an audio mixture. The audio mixture may contain background noise with a relatively low amplitude compared to the foreground mixture components. The choice of the sound class is provided as input to the model in form of a string, integer, or a one-hot encoding of the sound class.

Most implemented papers

Real-Time Target Sound Extraction

vb000/waveformer 4 Nov 2022

We present the first neural network model to achieve real-time and streaming target sound extraction.

Target Sound Extraction with Variable Cross-modality Clues

lichenda/multi-clue-tse-data 15 Mar 2023

Automatic target sound extraction (TSE) is a machine learning approach to mimic the human auditory perception capability of attending to a sound source of interest from a mixture of sources.

DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction

JHU-LCAP/DPM-TSE 6 Oct 2023

Common target sound extraction (TSE) approaches primarily relied on discriminative approaches in order to separate the target sound while minimizing interference from the unwanted sources, with varying success in separating the target from the background.

CLAPSep: Leveraging Contrastive Pre-trained Models for Multi-Modal Query-Conditioned Target Sound Extraction

aisaka0v0/clapsep 27 Feb 2024

Such systems consist of two components: a query network that converts user queries into conditional embeddings, and a separation network that extracts the target sound based on conditional embeddings.