Streaming Target Sound Extraction

1 papers with code • 1 benchmarks • 1 datasets

This task is a variant of the Target Sound Extraction task, with the constraint of causal streaming inference. Aiming for an algorithmic latency of less than 20 ms, at each time step, streaming audio models operate on an input audio chunk of length less than 20 ms. The causal constraint means that the model only has the knowledge of past chunks and no future chunks.

Most implemented papers

Real-Time Target Sound Extraction

vb000/waveformer 4 Nov 2022

We present the first neural network model to achieve real-time and streaming target sound extraction.