L3DAS21 is a dataset for 3D audio signal processing. It consists of a 65 hours 3D audio corpus, accompanied with a Python API that facilitates the data usage and results submission stage.
The LEDAS21 datasets contain multiple-source and multiple-perspective B-format Ambisonics audio recordings. The authors sampled the acoustic field of a large office room, placing two first-order Ambisonics microphones in the center of the room and moving a speaker reproducing the analytic signal in 252 fixed spatial positions. Relying on the collected Ambisonics impulse responses (IRs), the authors augmented existing clean monophonic datasets to obtain synthetic tridimensional sound sources by convolving the original sounds with our IRs.
The dataset is divided in two main sections, respectively dedicated to the challenge tasks.
The first section is optimized for 3D Speech Enhancement and contains more than 30000 virtual 3D audio environments with a duration up to 10 seconds. In each sample, a spoken voice is always present alongside with other office-like background noises. As target data for this section the authors provide the clean monophonic voice signals.
The other sections, instead, is dedicated to the 3D Sound Event Localization and Detection task and contains 900 60-seconds-long audio files. Each data point contains a simulated 3D office audio environment in which up to 3 simultaneous acoustic events may be active at the same time. In this section, the samples are not forced to contain a spoken voice. As target data for this section the authors provide a list of the onset and offset time stamps, the typology class, and the spatial coordinates of each individual sound event present in the data-points.Source: L3DAS21