8 code implementations • 30 Jun 2018 • Sharath Adavanne, Archontis Politis, Joonas Nikunen, Tuomas Virtanen
In this paper, we propose a convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3D) space.
Sound Audio and Speech Processing
1 code implementation • 29 Apr 2019 • Sharath Adavanne, Archontis Politis, Tuomas Virtanen
This paper investigates the joint localization, detection, and tracking of sound events using a convolutional recurrent neural network (CRNN).
3 code implementations • 21 May 2019 • Sharath Adavanne, Archontis Politis, Tuomas Virtanen
This paper presents the sound event localization and detection (SELD) task setup for the DCASE 2019 challenge.
Sound Audio and Speech Processing
2 code implementations • 2 Jun 2020 • Archontis Politis, Sharath Adavanne, Tuomas Virtanen
This report presents the dataset and the evaluation setup of the Sound Event Localization & Detection (SELD) task for the DCASE 2020 Challenge.
4 code implementations • 6 Sep 2020 • Archontis Politis, Annamaria Mesaros, Sharath Adavanne, Toni Heittola, Tuomas Virtanen
A large-scale realistic dataset of spatialized sound events was generated for the challenge, to be used for training of learning-based approaches, and for evaluation of the submissions in an unlabeled subset.
2 code implementations • 4 Jun 2022 • Archontis Politis, Kazuki Shimada, Parthasaarathy Sudarsanam, Sharath Adavanne, Daniel Krause, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji, Tuomas Virtanen
Additionally, the report presents the baseline system that accompanies the dataset in the challenge with emphasis on the differences with the baseline of the previous iterations; namely, introduction of the multi-ACCDOA representation to handle multiple simultaneous occurences of events of the same class, and support for additional improved input features for the microphone array format.
Ranked #1 on Sound Event Localization and Detection on STARSS22
1 code implementation • 13 Jun 2021 • Archontis Politis, Sharath Adavanne, Daniel Krause, Antoine Deleforge, Prerak Srivastava, Tuomas Virtanen
This report presents the dataset and baseline of Task 3 of the DCASE2021 Challenge on Sound Event Localization and Detection (SELD).
2 code implementations • 29 Oct 2021 • Sharath Adavanne, Archontis Politis, Tuomas Virtanen
Data-based and learning-based sound source localization (SSL) has shown promising results in challenging conditions, and is commonly set as a classification or a regression problem.
1 code implementation • NeurIPS 2023 • Kazuki Shimada, Archontis Politis, Parthasaarathy Sudarsanam, Daniel Krause, Kengo Uchida, Sharath Adavanne, Aapo Hakala, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Tuomas Virtanen, Yuki Mitsufuji
While direction of arrival (DOA) of sound events is generally estimated from multichannel audio data recorded in a microphone array, sound events usually derive from visually perceptible source objects, e. g., sounds of footsteps come from the feet of a walker.
1 code implementation • 26 Mar 2024 • Michael Neri, Archontis Politis, Daniel Krause, Marco Carli, Tuomas Virtanen
Distance estimation from audio plays a crucial role in various applications, such as acoustic scene analysis, sound source localization, and room modeling.
no code implementations • 29 Jan 2018 • Sharath Adavanne, Archontis Politis, Tuomas Virtanen
Each of this dataset has a four-channel first-order Ambisonic, binaural, and single-channel versions, on which the performance of SED using the proposed method are compared to study the potential of SED using multichannel audio.
no code implementations • 27 Oct 2017 • Sharath Adavanne, Archontis Politis, Tuomas Virtanen
This paper proposes a deep neural network for estimating the directions of arrival (DOA) of multiple sound sources.
no code implementations • 22 Jun 2021 • Shanshan Wang, Gaurav Naithani, Archontis Politis, Tuomas Virtanen
Time-frequency masking or spectrum prediction computed via short symmetric windows are commonly used in low-latency deep neural network (DNN) based source separation.
no code implementations • 28 Jun 2021 • Pasi Pertilä, Emre Cakir, Aapo Hakala, Eemi Fagerlund, Tuomas Virtanen, Archontis Politis, Antti Eronen
Joint sound event localization and detection (SELD) is an integral part of developing context awareness into communication interfaces of mobile robots, smartphones, and home assistants.
no code implementations • 26 Jul 2021 • Daniel Aleksander Krause, Archontis Politis, Annamaria Mesaros
Finally, we propose various ways of combining the proximity and direction estimation problems into a joint task providing temporal information about the onsets and offsets of the appearing sources.
no code implementations • 2 Jun 2022 • Shanshan Wang, Archontis Politis, Annamaria Mesaros, Tuomas Virtanen
In addition to the correspondence, AVSA also learns from the spatial location of acoustic and visual content.
no code implementations • 26 Oct 2022 • David Diaz-Guerra, Archontis Politis, Tuomas Virtanen
Recent data- and learning-based sound source localization (SSL) methods have shown strong performance in challenging acoustic scenarios.
no code implementations • 14 Mar 2023 • Wang Dai, Archontis Politis, Tuomas Virtanen
Specifically, each mask is used to multiply the corresponding channel's 2D representation, and the masked output of all channels are then summed.
no code implementations • 14 Jun 2023 • David Diaz-Guerra, Archontis Politis, Antonio Miguel, Jose R. Beltran, Tuomas Virtanen
Conventional recurrent neural networks (RNNs), such as the long short-term memories (LSTMs) or the gated recurrent units (GRUs), take a vector as their input and use another vector to store their state.
no code implementations • 17 Dec 2023 • Yuzhu Wang, Archontis Politis, Tuomas Virtanen
The clean speech clips from WSJ0 are employed for simulating speech signals of moving speakers in a reverberant environment.
no code implementations • 11 Jan 2024 • Mikko Heikkinen, Archontis Politis, Tuomas Virtanen
Ambisonics encoding of microphone array signals can enable various spatial audio applications, such as virtual reality or telepresence, but it is typically designed for uniformly-spaced spherical microphone arrays.
no code implementations • 24 Jan 2024 • Christoph Hold, Leo McCormack, Archontis Politis, Ville Pulkki
Scene-based spatial audio formats, such as Ambisonics, are playback system agnostic and may therefore be favoured for delivering immersive audio experiences to a wide range of (potentially unknown) devices.
no code implementations • 18 Mar 2024 • Daniel Aleksander Krause, Archontis Politis, Annamaria Mesaros
Sound Event Detection and Localization (SELD) is a combined task of identifying sound events and their corresponding direction-of-arrival (DOA).