STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events

This report presents the Sony-TAu Realistic Spatial Soundscapes 2022 (STARS22) dataset for sound event localization and detection, comprised of spatial recordings of real scenes collected in various interiors of two different sites. The dataset is captured with a high resolution spherical microphone array and delivered in two 4-channel formats, first-order Ambisonics and tetrahedral microphone array. Sound events in the dataset belonging to 13 target sound classes are annotated both temporally and spatially through a combination of human annotation and optical tracking. The dataset serves as the development and evaluation dataset for the Task 3 of the DCASE2022 Challenge on Sound Event Localization and Detection and introduces significant new challenges for the task compared to the previous iterations, which were based on synthetic spatialized sound scene recordings. Dataset specifications are detailed including recording and annotation process, target classes and their presence, and details on the development and evaluation splits. Additionally, the report presents the baseline system that accompanies the dataset in the challenge with emphasis on the differences with the baseline of the previous iterations; namely, introduction of the multi-ACCDOA representation to handle multiple simultaneous occurences of events of the same class, and support for additional improved input features for the microphone array format. Results of the baseline indicate that with a suitable training strategy a reasonable detection and localization performance can be achieved on real sound scene recordings. The dataset is available in https://zenodo.org/record/6387880.

PDF Abstract

Datasets


Introduced in the Paper:

STARSS22

Used in the Paper:

FSD50K
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Sound Event Localization and Detection STARSS22 Baseline (FOA) Localization-dependent error rate (20°) 71 # 1
location-dependent F1-score (macro) 21 # 1
location-dependent F1-score (micro) 0.36 # 1
Class-dependent localization error 29.3 # 1
Class-dependent localization recall 46 # 2
Sound Event Localization and Detection STARSS22 Baseline (MIC) location-dependent F1-score (macro) 18 # 2
location-dependent F1-score (micro) 0.36 # 1
Class-dependent localization error 32.2 # 2
Class-dependent localization recall 47 # 1

Methods


No methods listed for this paper. Add relevant methods here