Search Results for author: Sharath Adavanne

Found 19 papers, 9 papers with code

Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

8 code implementations30 Jun 2018 Sharath Adavanne, Archontis Politis, Joonas Nikunen, Tuomas Virtanen

In this paper, we propose a convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3D) space.

Sound Audio and Speech Processing

Localization, Detection and Tracking of Multiple Moving Sound Sources with a Convolutional Recurrent Neural Network

1 code implementation29 Apr 2019 Sharath Adavanne, Archontis Politis, Tuomas Virtanen

This paper investigates the joint localization, detection, and tracking of sound events using a convolutional recurrent neural network (CRNN).

A multi-room reverberant dataset for sound event localization and detection

3 code implementations21 May 2019 Sharath Adavanne, Archontis Politis, Tuomas Virtanen

This paper presents the sound event localization and detection (SELD) task setup for the DCASE 2019 challenge.

Sound Audio and Speech Processing

A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection

2 code implementations2 Jun 2020 Archontis Politis, Sharath Adavanne, Tuomas Virtanen

This report presents the dataset and the evaluation setup of the Sound Event Localization & Detection (SELD) task for the DCASE 2020 Challenge.

Sound Event Localization and Detection

Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019

4 code implementations6 Sep 2020 Archontis Politis, Annamaria Mesaros, Sharath Adavanne, Toni Heittola, Tuomas Virtanen

A large-scale realistic dataset of spatialized sound events was generated for the challenge, to be used for training of learning-based approaches, and for evaluation of the submissions in an unlabeled subset.

Data Augmentation Sound Event Localization and Detection

STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events

2 code implementations4 Jun 2022 Archontis Politis, Kazuki Shimada, Parthasaarathy Sudarsanam, Sharath Adavanne, Daniel Krause, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji, Tuomas Virtanen

Additionally, the report presents the baseline system that accompanies the dataset in the challenge with emphasis on the differences with the baseline of the previous iterations; namely, introduction of the multi-ACCDOA representation to handle multiple simultaneous occurences of events of the same class, and support for additional improved input features for the microphone array format.

Sound Event Localization and Detection

Differentiable Tracking-Based Training of Deep Learning Sound Source Localizers

2 code implementations29 Oct 2021 Sharath Adavanne, Archontis Politis, Tuomas Virtanen

Data-based and learning-based sound source localization (SSL) has shown promising results in challenging conditions, and is commonly set as a classification or a regression problem.

Classification Direction of Arrival Estimation +2

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events

1 code implementation NeurIPS 2023 Kazuki Shimada, Archontis Politis, Parthasaarathy Sudarsanam, Daniel Krause, Kengo Uchida, Sharath Adavanne, Aapo Hakala, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Tuomas Virtanen, Yuki Mitsufuji

While direction of arrival (DOA) of sound events is generally estimated from multichannel audio data recorded in a microphone array, sound events usually derive from visually perceptible source objects, e. g., sounds of footsteps come from the feet of a walker.

Sound Event Localization and Detection

Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features

no code implementations29 Jan 2018 Sharath Adavanne, Archontis Politis, Tuomas Virtanen

Each of this dataset has a four-channel first-order Ambisonic, binaural, and single-channel versions, on which the performance of SED using the proposed method are compared to study the potential of SED using multichannel audio.

Event Detection Sound Event Detection

Automated Audio Captioning with Recurrent Neural Networks

no code implementations30 Jun 2017 Konstantinos Drossos, Sharath Adavanne, Tuomas Virtanen

The encoder is a multi-layered, bi-directional gated recurrent unit (GRU) and the decoder a multi-layered GRU with a classification layer connected to the last GRU of the decoder.

Audio captioning General Classification +3

Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features

no code implementations7 Jun 2017 Sharath Adavanne, Giambattista Parascandolo, Pasi Pertilä, Toni Heittola, Tuomas Virtanen

In this paper, we propose the use of spatial and harmonic features in combination with long short term memory (LSTM) recurrent neural network (RNN) for automatic sound event detection (SED) task.

Event Detection Sound Event Detection

Stacked Convolutional and Recurrent Neural Networks for Bird Audio Detection

no code implementations7 Jun 2017 Sharath Adavanne, Konstantinos Drossos, Emre Çakır, Tuomas Virtanen

This paper studies the detection of bird calls in audio segments using stacked convolutional and recurrent neural networks.

Bird Audio Detection Data Augmentation +1

Convolutional Recurrent Neural Networks for Bird Audio Detection

no code implementations7 Mar 2017 EmreÇakır, Sharath Adavanne, Giambattista Parascandolo, Konstantinos Drossos, Tuomas Virtanen

Bird sounds possess distinctive spectral structure which may exhibit small shifts in spectrum depending on the bird species and environmental conditions.

Bird Audio Detection

Non-native English lexicon creation for bilingual speech synthesis

no code implementations21 Jun 2021 Arun Baby, Pranav Jawale, Saranya Vinnaitherthan, Sumukh Badam, Nagaraj Adiga, Sharath Adavanne

Due to the inconsistency between the non-native English pronunciation in the audio and native English lexicon in the text, the intelligibility of synthesized speech in such TTS systems is significantly reduced.

Speech Synthesis

Cannot find the paper you are looking for? You can Submit a new open access paper.