Sound Event Localization and Detection
28 papers with code • 5 benchmarks • 8 datasets
Given multichannel audio input, a sound event detection and localization (SELD) system outputs a temporal activation track for each of the target sound classes, along with one or more corresponding spatial trajectories when the track indicates activity. This results in a spatio-temporal characterization of the acoustic scene that can be used in a wide range of machine cognition tasks, such as inference on the type of environment, self-localization, navigation without visual input or with occluded targets, tracking of specific types of sound sources, smart-home applications, scene visualization systems, and audio surveillance, among others.
Datasets
Most implemented papers
What Makes Sound Event Localization and Detection Difficult? Insights from Error Analysis
Sound event localization and detection (SELD) is an emerging research topic that aims to unify the tasks of sound event detection and direction-of-arrival estimation.
SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection
Sound event localization and detection (SELD) consists of two subtasks, which are sound event detection and direction-of-arrival estimation.
Spatial mixup: Directional loudness modification as data augmentation for sound event localization and detection
Data augmentation methods have shown great importance in diverse supervised learning problems where labeled data is scarce or costly to obtain.
Wearable SELD dataset: Dataset for sound event localization and detection using wearable devices around head
Sound event localization and detection (SELD) is a combined task of identifying the sound event and its direction.
Echo-aware Adaptation of Sound Event Localization and Detection in Unknown Environments
Our goal is to develop a sound event localization and detection (SELD) system that works robustly in unknown environments.
L3DAS22 Challenge: Learning 3D Audio Sources in a Real Office Environment
The L3DAS22 Challenge is aimed at encouraging the development of machine learning strategies for 3D speech enhancement and 3D sound localization and detection in office-like environments.
Filler Word Detection and Classification: A Dataset and Benchmark
In this work, we present a novel speech dataset, PodcastFillers, with 35K annotated filler words and 50K annotations of other sounds that commonly occur in podcasts such as breaths, laughter, and word repetitions.
Dual Quaternion Ambisonics Array for Six-Degree-of-Freedom Acoustic Representation
We show that our dual quaternion SELD model with temporal convolution blocks (DualQSELD-TCN) achieves better results with respect to real and quaternion-valued baselines thanks to our augmented representation of the sound field.
A Synapse-Threshold Synergistic Learning Approach for Spiking Neural Networks
Most existing methods for training SNNs are based on the concept of synaptic plasticity; however, learning in the realistic brain also utilizes intrinsic non-synaptic mechanisms of neurons.
Sound Event Localization and Detection for Real Spatial Sound Scenes: Event-Independent Network and Data Augmentation Chains
Our system submitted to the DCASE 2022 Task 3 is based on our previous proposed Event-Independent Network V2 (EINV2) with a novel data augmentation method.