8 code implementations • 30 Jun 2018 • Sharath Adavanne, Archontis Politis, Joonas Nikunen, Tuomas Virtanen
In this paper, we propose a convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3D) space.
Sound Audio and Speech Processing
1 code implementation • 29 Apr 2019 • Sharath Adavanne, Archontis Politis, Tuomas Virtanen
This paper investigates the joint localization, detection, and tracking of sound events using a convolutional recurrent neural network (CRNN).
3 code implementations • 21 May 2019 • Sharath Adavanne, Archontis Politis, Tuomas Virtanen
This paper presents the sound event localization and detection (SELD) task setup for the DCASE 2019 challenge.
Sound Audio and Speech Processing
2 code implementations • 2 Jun 2020 • Archontis Politis, Sharath Adavanne, Tuomas Virtanen
This report presents the dataset and the evaluation setup of the Sound Event Localization & Detection (SELD) task for the DCASE 2020 Challenge.
4 code implementations • 6 Sep 2020 • Archontis Politis, Annamaria Mesaros, Sharath Adavanne, Toni Heittola, Tuomas Virtanen
A large-scale realistic dataset of spatialized sound events was generated for the challenge, to be used for training of learning-based approaches, and for evaluation of the submissions in an unlabeled subset.
2 code implementations • 4 Jun 2022 • Archontis Politis, Kazuki Shimada, Parthasaarathy Sudarsanam, Sharath Adavanne, Daniel Krause, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji, Tuomas Virtanen
Additionally, the report presents the baseline system that accompanies the dataset in the challenge with emphasis on the differences with the baseline of the previous iterations; namely, introduction of the multi-ACCDOA representation to handle multiple simultaneous occurences of events of the same class, and support for additional improved input features for the microphone array format.
Ranked #1 on Sound Event Localization and Detection on STARSS22
1 code implementation • 13 Jun 2021 • Archontis Politis, Sharath Adavanne, Daniel Krause, Antoine Deleforge, Prerak Srivastava, Tuomas Virtanen
This report presents the dataset and baseline of Task 3 of the DCASE2021 Challenge on Sound Event Localization and Detection (SELD).
2 code implementations • 29 Oct 2021 • Sharath Adavanne, Archontis Politis, Tuomas Virtanen
Data-based and learning-based sound source localization (SSL) has shown promising results in challenging conditions, and is commonly set as a classification or a regression problem.
1 code implementation • NeurIPS 2023 • Kazuki Shimada, Archontis Politis, Parthasaarathy Sudarsanam, Daniel Krause, Kengo Uchida, Sharath Adavanne, Aapo Hakala, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Tuomas Virtanen, Yuki Mitsufuji
While direction of arrival (DOA) of sound events is generally estimated from multichannel audio data recorded in a microphone array, sound events usually derive from visually perceptible source objects, e. g., sounds of footsteps come from the feet of a walker.
no code implementations • 29 Jan 2018 • Sharath Adavanne, Archontis Politis, Tuomas Virtanen
Each of this dataset has a four-channel first-order Ambisonic, binaural, and single-channel versions, on which the performance of SED using the proposed method are compared to study the potential of SED using multichannel audio.
no code implementations • 27 Oct 2017 • Sharath Adavanne, Archontis Politis, Tuomas Virtanen
This paper proposes a deep neural network for estimating the directions of arrival (DOA) of multiple sound sources.
no code implementations • 30 Jun 2017 • Konstantinos Drossos, Sharath Adavanne, Tuomas Virtanen
The encoder is a multi-layered, bi-directional gated recurrent unit (GRU) and the decoder a multi-layered GRU with a classification layer connected to the last GRU of the decoder.
no code implementations • 7 Jun 2017 • Sharath Adavanne, Giambattista Parascandolo, Pasi Pertilä, Toni Heittola, Tuomas Virtanen
In this paper, we propose the use of spatial and harmonic features in combination with long short term memory (LSTM) recurrent neural network (RNN) for automatic sound event detection (SED) task.
no code implementations • 7 Jun 2017 • Miroslav Malik, Sharath Adavanne, Konstantinos Drossos, Tuomas Virtanen, Dasa Ticha, Roman Jarina
This paper studies the emotion recognition from musical tracks in the 2-dimensional valence-arousal (V-A) emotional space.
no code implementations • 7 Jun 2017 • Sharath Adavanne, Pasi Pertilä, Tuomas Virtanen
This paper proposes to use low-level spatial features extracted from multichannel audio for sound event detection.
no code implementations • 7 Jun 2017 • Sharath Adavanne, Konstantinos Drossos, Emre Çakır, Tuomas Virtanen
This paper studies the detection of bird calls in audio segments using stacked convolutional and recurrent neural networks.
no code implementations • 7 Mar 2017 • EmreÇakır, Sharath Adavanne, Giambattista Parascandolo, Konstantinos Drossos, Tuomas Virtanen
Bird sounds possess distinctive spectral structure which may exhibit small shifts in spectrum depending on the bird species and environmental conditions.
no code implementations • 21 Jun 2021 • Arun Baby, Pranav Jawale, Saranya Vinnaitherthan, Sumukh Badam, Nagaraj Adiga, Sharath Adavanne
Due to the inconsistency between the non-native English pronunciation in the audio and native English lexicon in the text, the intelligibility of synthesized speech in such TTS systems is significantly reduced.
no code implementations • 9 Jun 2022 • Arun Baby, Saranya Vinnaitherthan, Akhil Kerhalkar, Pranav Jawale, Sharath Adavanne, Nagaraj Adiga
We proposed two methods to determine a suitable cost function to retrieve the OOV words based on the context.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3