Search Results for author: Sharath Adavanne

Found 19 papers, 9 papers with code

Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

8 code implementations • 30 Jun 2018 • Sharath Adavanne, Archontis Politis, Joonas Nikunen, Tuomas Virtanen

In this paper, we propose a convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3D) space.

Sound Audio and Speech Processing

310

Paper
Code

Localization, Detection and Tracking of Multiple Moving Sound Sources with a Convolutional Recurrent Neural Network

1 code implementation • 29 Apr 2019 • Sharath Adavanne, Archontis Politis, Tuomas Virtanen

This paper investigates the joint localization, detection, and tracking of sound events using a convolutional recurrent neural network (CRNN).

310

Paper
Code

A multi-room reverberant dataset for sound event localization and detection

3 code implementations • 21 May 2019 • Sharath Adavanne, Archontis Politis, Tuomas Virtanen

This paper presents the sound event localization and detection (SELD) task setup for the DCASE 2019 challenge.

Sound Audio and Speech Processing

Paper
Code

A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection

2 code implementations • 2 Jun 2020 • Archontis Politis, Sharath Adavanne, Tuomas Virtanen

This report presents the dataset and the evaluation setup of the Sound Event Localization & Detection (SELD) task for the DCASE 2020 Challenge.

Sound Event Localization and Detection

Paper
Code

Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019

4 code implementations • 6 Sep 2020 • Archontis Politis, Annamaria Mesaros, Sharath Adavanne, Toni Heittola, Tuomas Virtanen

A large-scale realistic dataset of spatialized sound events was generated for the challenge, to be used for training of learning-based approaches, and for evaluation of the submissions in an unlabeled subset.

Data Augmentation Sound Event Localization and Detection

Paper
Code

STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events

2 code implementations • 4 Jun 2022 • Archontis Politis, Kazuki Shimada, Parthasaarathy Sudarsanam, Sharath Adavanne, Daniel Krause, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji, Tuomas Virtanen

Additionally, the report presents the baseline system that accompanies the dataset in the challenge with emphasis on the differences with the baseline of the previous iterations; namely, introduction of the multi-ACCDOA representation to handle multiple simultaneous occurences of events of the same class, and support for additional improved input features for the microphone array format.

Ranked #1 on Sound Event Localization and Detection on STARSS22

Sound Event Localization and Detection

Paper
Code

A Dataset of Dynamic Reverberant Sound Scenes with Directional Interferers for Sound Event Localization and Detection

1 code implementation • 13 Jun 2021 • Archontis Politis, Sharath Adavanne, Daniel Krause, Antoine Deleforge, Prerak Srivastava, Tuomas Virtanen

This report presents the dataset and baseline of Task 3 of the DCASE2021 Challenge on Sound Event Localization and Detection (SELD).

Sound Event Localization and Detection

Paper
Code

Differentiable Tracking-Based Training of Deep Learning Sound Source Localizers

2 code implementations • 29 Oct 2021 • Sharath Adavanne, Archontis Politis, Tuomas Virtanen

Data-based and learning-based sound source localization (SSL) has shown promising results in challenging conditions, and is commonly set as a classification or a regression problem.

Classification Direction of Arrival Estimation +2

Paper
Code

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events

1 code implementation • NeurIPS 2023 • Kazuki Shimada, Archontis Politis, Parthasaarathy Sudarsanam, Daniel Krause, Kengo Uchida, Sharath Adavanne, Aapo Hakala, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Tuomas Virtanen, Yuki Mitsufuji

While direction of arrival (DOA) of sound events is generally estimated from multichannel audio data recorded in a microphone array, sound events usually derive from visually perceptible source objects, e. g., sounds of footsteps come from the feet of a walker.

Sound Event Localization and Detection

Paper
Code

Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features

no code implementations • 29 Jan 2018 • Sharath Adavanne, Archontis Politis, Tuomas Virtanen

Each of this dataset has a four-channel first-order Ambisonic, binaural, and single-channel versions, on which the performance of SED using the proposed method are compared to study the potential of SED using multichannel audio.

Event Detection Sound Event Detection

Paper
Add Code

Direction of arrival estimation for multiple sound sources using convolutional recurrent neural network

no code implementations • 27 Oct 2017 • Sharath Adavanne, Archontis Politis, Tuomas Virtanen

This paper proposes a deep neural network for estimating the directions of arrival (DOA) of multiple sound sources.

Direction of Arrival Estimation

Paper
Add Code

Automated Audio Captioning with Recurrent Neural Networks

no code implementations • 30 Jun 2017 • Konstantinos Drossos, Sharath Adavanne, Tuomas Virtanen

The encoder is a multi-layered, bi-directional gated recurrent unit (GRU) and the decoder a multi-layered GRU with a classification layer connected to the last GRU of the decoder.

Audio captioning General Classification +3

Paper
Add Code

Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features

no code implementations • 7 Jun 2017 • Sharath Adavanne, Giambattista Parascandolo, Pasi Pertilä, Toni Heittola, Tuomas Virtanen

In this paper, we propose the use of spatial and harmonic features in combination with long short term memory (LSTM) recurrent neural network (RNN) for automatic sound event detection (SED) task.

Event Detection Sound Event Detection

Paper
Add Code

Stacked Convolutional and Recurrent Neural Networks for Music Emotion Recognition

no code implementations • 7 Jun 2017 • Miroslav Malik, Sharath Adavanne, Konstantinos Drossos, Tuomas Virtanen, Dasa Ticha, Roman Jarina

This paper studies the emotion recognition from musical tracks in the 2-dimensional valence-arousal (V-A) emotional space.

Emotion Recognition Music Emotion Recognition

Paper
Add Code

Sound Event Detection Using Spatial Features and Convolutional Recurrent Neural Network

no code implementations • 7 Jun 2017 • Sharath Adavanne, Pasi Pertilä, Tuomas Virtanen

This paper proposes to use low-level spatial features extracted from multichannel audio for sound event detection.

Event Detection Sound Event Detection

Paper
Add Code

Stacked Convolutional and Recurrent Neural Networks for Bird Audio Detection

no code implementations • 7 Jun 2017 • Sharath Adavanne, Konstantinos Drossos, Emre Çakır, Tuomas Virtanen

This paper studies the detection of bird calls in audio segments using stacked convolutional and recurrent neural networks.

Bird Audio Detection Data Augmentation +1

Paper
Add Code

Convolutional Recurrent Neural Networks for Bird Audio Detection

no code implementations • 7 Mar 2017 • EmreÇakır, Sharath Adavanne, Giambattista Parascandolo, Konstantinos Drossos, Tuomas Virtanen

Bird sounds possess distinctive spectral structure which may exhibit small shifts in spectrum depending on the bird species and environmental conditions.

Bird Audio Detection

Paper
Add Code

Non-native English lexicon creation for bilingual speech synthesis

no code implementations • 21 Jun 2021 • Arun Baby, Pranav Jawale, Saranya Vinnaitherthan, Sumukh Badam, Nagaraj Adiga, Sharath Adavanne

Due to the inconsistency between the non-native English pronunciation in the audio and native English lexicon in the text, the intelligibility of synthesized speech in such TTS systems is significantly reduced.

Speech Synthesis

Paper
Add Code

Context-based out-of-vocabulary word recovery for ASR systems in Indian languages

no code implementations • 9 Jun 2022 • Arun Baby, Saranya Vinnaitherthan, Akhil Kerhalkar, Pranav Jawale, Sharath Adavanne, Nagaraj Adiga

We proposed two methods to determine a suitable cost function to retrieve the OOV words based on the context.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.