no code implementations • 18 Mar 2024 • Daniel Aleksander Krause, Archontis Politis, Annamaria Mesaros
Sound Event Detection and Localization (SELD) is a combined task of identifying sound events and their corresponding direction-of-arrival (DOA).
1 code implementation • 5 Feb 2024 • Shanshan Wang, Soumya Tripathy, Toni Heittola, Annamaria Mesaros
In Self-Supervised Learning (SSL), Audio-Visual Correspondence (AVC) is a popular task to learn deep audio and video features from large unlabeled datasets.
no code implementations • 9 Jan 2024 • Manjunath Mulimani, Annamaria Mesaros
Experiments are performed on a dataset with 50 sound classes, with an initial classification task containing 30 base classes and 4 incremental phases of 5 classes each.
no code implementations • 25 Sep 2023 • Manu Harju, Annamaria Mesaros
Classification systems are normally trained by minimizing the cross-entropy between system outputs and reference labels, which makes the Kullback-Leibler divergence a natural choice for measuring how closely the system can follow the data.
no code implementations • 28 Feb 2023 • Irene Martín-Morató, Manu Harju, Paul Ahokas, Annamaria Mesaros
In this paper, we study the use of soft labels to train a system for sound event detection (SED).
no code implementations • 28 Feb 2023 • Manjunath Mulimani, Annamaria Mesaros
At the same time, its performance on the previous ASC task decreases only by 5. 1 percentage points due to the additional learning of the AT task.
1 code implementation • 10 Nov 2022 • Shanshan Wang, Soumya Tripathy, Annamaria Mesaros
To improve the discriminative ability of feature embeddings in SSL, we propose a new loss function called Angular Contrastive Loss (ACL), a linear combination of angular margin and contrastive loss.
no code implementations • 13 Sep 2022 • Daniel Aleksander Krause, Annamaria Mesaros
Sound event detection (SED) and Acoustic scene classification (ASC) are two widely researched audio tasks that constitute an important part of research on acoustic scene analysis.
no code implementations • 8 Jun 2022 • Irene Martín-Morató, Francesco Paissan, Alberto Ancilotto, Toni Heittola, Annamaria Mesaros, Elisabetta Farella, Alessio Brutti, Tuomas Virtanen
The provided baseline system is a convolutional neural network which employs post-training quantization of parameters, resulting in 46. 5 K parameters, and 29. 23 million multiply-and-accumulate operations (MMACs).
no code implementations • 2 Jun 2022 • Shanshan Wang, Archontis Politis, Annamaria Mesaros, Tuomas Virtanen
In addition to the correspondence, AVSA also learns from the spatial location of acoustic and visual content.
no code implementations • 26 Jul 2021 • Daniel Aleksander Krause, Archontis Politis, Annamaria Mesaros
Finally, we propose various ways of combining the proximity and direction estimation problems into a joint task providing temporal information about the onsets and offsets of the appearing sources.
no code implementations • 26 Jul 2021 • Irene Martín-Morató, Manu Harju, Annamaria Mesaros
Strong labels are a necessity for evaluation of sound event detection methods, but often scarcely available due to the high resources required by the annotation task.
1 code implementation • 12 Jul 2021 • Annamaria Mesaros, Toni Heittola, Tuomas Virtanen, Mark D. Plumbley
The goal of automatic sound event detection (SED) methods is to recognize what is happening in an audio signal and when it is happening.
1 code implementation • 28 May 2021 • Irene Martín-Morató, Toni Heittola, Annamaria Mesaros, Tuomas Virtanen
The most used techniques among the submissions were residual networks and weight quantization, with the top systems reaching over 70% accuracy, and log loss under 0. 8.
no code implementations • 28 May 2021 • Shanshan Wang, Toni Heittola, Annamaria Mesaros, Tuomas Virtanen
More importantly, multi-modal methods using both audio and video are employed by all the top 5 teams.
no code implementations • 9 Apr 2021 • Irene Martin-Morato, Annamaria Mesaros
Crowdsourcing has become a common approach for annotating large amounts of data.
4 code implementations • 6 Sep 2020 • Archontis Politis, Annamaria Mesaros, Sharath Adavanne, Toni Heittola, Tuomas Virtanen
A large-scale realistic dataset of spatialized sound events was generated for the challenge, to be used for training of learning-based approaches, and for evaluation of the submissions in an unlabeled subset.
no code implementations • 29 May 2020 • Toni Heittola, Annamaria Mesaros, Tuomas Virtanen
This paper presents the details of Task 1: Acoustic Scene Classification in the DCASE 2020 Challenge.
2 code implementations • 25 Jul 2018 • Annamaria Mesaros, Toni Heittola, Tuomas Virtanen
This paper introduces the acoustic scene classification task of DCASE 2018 Challenge and the TUT Urban Acoustic Scenes 2018 dataset provided for the task, and evaluates the performance of a baseline system in the task.