Sound Event Detection
74 papers with code • 4 benchmarks • 18 datasets
Sound Event Detection (SED) is the task of recognizing the sound events and their respective temporal start and end time in a recording. Sound events in real life do not always occur in isolation, but tend to considerably overlap with each other. Recognizing such overlapping sound events is referred as polyphonic SED.
Source: A report on sound event detection with different binaural features
Libraries
Use these libraries to find Sound Event Detection models and implementationsDatasets
Latest papers
Pretraining Representations for Bioacoustic Few-shot Detection using Supervised Contrastive Learning
The bioacoustic community recasted the problem of sound event detection within the framework of few-shot learning, i. e. training a system with only few labeled examples.
Post-Processing Independent Evaluation of Sound Event Detection Systems
It summarizes the system performance over a range of operating modes resulting from varying the decision threshold that is used to translate the system output scores into a binary detection output.
Few-shot bioacoustic event detection at the DCASE 2023 challenge
Few-shot bioacoustic event detection consists in detecting sound events of specified types, in varying soundscapes, while having access to only a few examples of the class of interest.
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks
In order to tackle both clip-level and frame-level tasks, this paper proposes Audio Teacher-Student Transformer (ATST), with a clip-level version (named ATST-Clip) and a frame-level version (named ATST-Frame), responsible for learning clip-level and frame-level representations, respectively.
Adversarial Representation Learning for Robust Privacy Preservation in Audio
In this study, we propose a novel adversarial training method for learning representations of audio recordings that effectively prevents the detection of speech activity from the latent features of the recordings.
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
To address this data scarcity issue, we introduce WavCaps, the first large-scale weakly-labelled audio captioning dataset, comprising approximately 400k audio clips with paired captions.
AD-YOLO: You Look Only Once in Training Multiple Sound Event Localization and Detection
Hence, the format enables the model to handle the polyphony problem, regardless of the number of sound overlaps.
A dataset for Audio-Visual Sound Event Detection in Movies
In this work, we present a dataset of audio events called Subtitle-Aligned Movie Sounds (SAM-S).
Automatic Sound Event Detection and Classification of Great Ape Calls Using Neural Networks
We present a novel approach to automatically detect and classify great ape calls from continuous raw audio recordings collected during field research.
On Out-of-Distribution Detection for Audio with Deep Nearest Neighbors
Out-of-distribution (OOD) detection is concerned with identifying data points that do not belong to the same distribution as the model's training data.