Sound Event Detection
46 papers with code • 4 benchmarks • 17 datasets
Sound Event Detection (SED) is the task of recognizing the sound events and their respective temporal start and end time in a recording. Sound events in real life do not always occur in isolation, but tend to considerably overlap with each other. Recognizing such overlapping sound events is referred as polyphonic SED.
LibrariesUse these libraries to find Sound Event Detection models and implementations
Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal.
Hypercomplex neural networks have proved to reduce the overall number of parameters while ensuring valuable performances by leveraging the properties of Clifford algebras.
In this paper we present an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs).
To foster the investigation of label noise in sound event classification we present FSDnoisy18k, a dataset containing 42. 5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.
The understanding of the surrounding environment plays a critical role in autonomous robotic systems, such as self-driving cars.
ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and Detection
Conventional NN-based methods use two branches for a sound event detection (SED) target and a direction-of-arrival (DOA) target.
The recently proposed Mean Teacher method, which exploits large-scale unlabeled data in a self-ensembling manner, has achieved state-of-the-art results in several semi-supervised learning benchmarks.
Sound event detection (SED), as a core module of acoustic environmental analysis, suffers from the problem of data deficiency.
Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure.