Search Results for author: Archontis Politis

Found 25 papers, 10 papers with code

Gaunt coefficients for complex and real spherical harmonics with applications to spherical array processing and Ambisonics

no code implementations9 Jul 2024 Archontis Politis

Additionally, the report provides a derivation of the Gaunt coefficients for real SHs, which has been missing from the literature and can be used directly in spatial audio frameworks such as Ambisonics.

Reference Channel Selection by Multi-Channel Masking for End-to-End Multi-Channel Speech Enhancement

no code implementations5 Jun 2024 Wang Dai, Xiaofei Li, Archontis Politis, Tuomas Virtanen

The experimental results on the Spear challenge simulated dataset D4 demonstrate the superiority of our proposed method over the conventional approach of using a fixed reference channel with single-channel masking.

Speech Enhancement

Speaker Distance Estimation in Enclosures from Single-Channel Audio

1 code implementation26 Mar 2024 Michael Neri, Archontis Politis, Daniel Krause, Marco Carli, Tuomas Virtanen

Distance estimation from audio plays a crucial role in various applications, such as acoustic scene analysis, sound source localization, and room modeling.

Sound Event Detection and Localization with Distance Estimation

no code implementations18 Mar 2024 Daniel Aleksander Krause, Archontis Politis, Annamaria Mesaros

Sound Event Detection and Localization (SELD) is a combined task of identifying sound events and their corresponding direction-of-arrival (DOA).

Event Detection Sound Event Detection

Perceptually-motivated Spatial Audio Codec for Higher-Order Ambisonics Compression

no code implementations24 Jan 2024 Christoph Hold, Leo McCormack, Archontis Politis, Ville Pulkki

Scene-based spatial audio formats, such as Ambisonics, are playback system agnostic and may therefore be favoured for delivering immersive audio experiences to a wide range of (potentially unknown) devices.

Decoder

Neural Ambisonics encoding for compact irregular microphone arrays

no code implementations11 Jan 2024 Mikko Heikkinen, Archontis Politis, Tuomas Virtanen

Ambisonics encoding of microphone array signals can enable various spatial audio applications, such as virtual reality or telepresence, but it is typically designed for uniformly-spaced spherical microphone arrays.

Attention-Driven Multichannel Speech Enhancement in Moving Sound Source Scenarios

no code implementations17 Dec 2023 Yuzhu Wang, Archontis Politis, Tuomas Virtanen

The clean speech clips from WSJ0 are employed for simulating speech signals of moving speakers in a reverberant environment.

Speech Enhancement

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events

1 code implementation NeurIPS 2023 Kazuki Shimada, Archontis Politis, Parthasaarathy Sudarsanam, Daniel Krause, Kengo Uchida, Sharath Adavanne, Aapo Hakala, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Tuomas Virtanen, Yuki Mitsufuji

While direction of arrival (DOA) of sound events is generally estimated from multichannel audio data recorded in a microphone array, sound events usually derive from visually perceptible source objects, e. g., sounds of footsteps come from the feet of a walker.

Sound Event Localization and Detection

Permutation Invariant Recurrent Neural Networks for Sound Source Tracking Applications

no code implementations14 Jun 2023 David Diaz-Guerra, Archontis Politis, Antonio Miguel, Jose R. Beltran, Tuomas Virtanen

Conventional recurrent neural networks (RNNs), such as the long short-term memories (LSTMs) or the gated recurrent units (GRUs), take a vector as their input and use another vector to store their state.

Multi-Channel Masking with Learnable Filterbank for Sound Source Separation

no code implementations14 Mar 2023 Wang Dai, Archontis Politis, Tuomas Virtanen

Specifically, each mask is used to multiply the corresponding channel's 2D representation, and the masked output of all channels are then summed.

Position tracking of a varying number of sound sources with sliding permutation invariant training

no code implementations26 Oct 2022 David Diaz-Guerra, Archontis Politis, Tuomas Virtanen

Recent data- and learning-based sound source localization (SSL) methods have shown strong performance in challenging acoustic scenarios.

Position

STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events

2 code implementations4 Jun 2022 Archontis Politis, Kazuki Shimada, Parthasaarathy Sudarsanam, Sharath Adavanne, Daniel Krause, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji, Tuomas Virtanen

Additionally, the report presents the baseline system that accompanies the dataset in the challenge with emphasis on the differences with the baseline of the previous iterations; namely, introduction of the multi-ACCDOA representation to handle multiple simultaneous occurences of events of the same class, and support for additional improved input features for the microphone array format.

Sound Event Localization and Detection

Differentiable Tracking-Based Training of Deep Learning Sound Source Localizers

2 code implementations29 Oct 2021 Sharath Adavanne, Archontis Politis, Tuomas Virtanen

Data-based and learning-based sound source localization (SSL) has shown promising results in challenging conditions, and is commonly set as a classification or a regression problem.

Classification Direction of Arrival Estimation +2

Joint Direction and Proximity Classification of Overlapping Sound Events from Binaural Audio

no code implementations26 Jul 2021 Daniel Aleksander Krause, Archontis Politis, Annamaria Mesaros

Finally, we propose various ways of combining the proximity and direction estimation problems into a joint task providing temporal information about the onsets and offsets of the appearing sources.

Mobile Microphone Array Speech Detection and Localization in Diverse Everyday Environments

no code implementations28 Jun 2021 Pasi Pertilä, Emre Cakir, Aapo Hakala, Eemi Fagerlund, Tuomas Virtanen, Archontis Politis, Antti Eronen

Joint sound event localization and detection (SELD) is an integral part of developing context awareness into communication interfaces of mobile robots, smartphones, and home assistants.

Sound Event Localization and Detection

Deep neural network Based Low-latency Speech Separation with Asymmetric analysis-Synthesis Window Pair

no code implementations22 Jun 2021 Shanshan Wang, Gaurav Naithani, Archontis Politis, Tuomas Virtanen

Time-frequency masking or spectrum prediction computed via short symmetric windows are commonly used in low-latency deep neural network (DNN) based source separation.

Clustering Deep Clustering +2

Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019

4 code implementations6 Sep 2020 Archontis Politis, Annamaria Mesaros, Sharath Adavanne, Toni Heittola, Tuomas Virtanen

A large-scale realistic dataset of spatialized sound events was generated for the challenge, to be used for training of learning-based approaches, and for evaluation of the submissions in an unlabeled subset.

Data Augmentation Sound Event Localization and Detection

A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection

2 code implementations2 Jun 2020 Archontis Politis, Sharath Adavanne, Tuomas Virtanen

This report presents the dataset and the evaluation setup of the Sound Event Localization & Detection (SELD) task for the DCASE 2020 Challenge.

Sound Event Localization and Detection

A multi-room reverberant dataset for sound event localization and detection

3 code implementations21 May 2019 Sharath Adavanne, Archontis Politis, Tuomas Virtanen

This paper presents the sound event localization and detection (SELD) task setup for the DCASE 2019 challenge.

Sound Audio and Speech Processing

Localization, Detection and Tracking of Multiple Moving Sound Sources with a Convolutional Recurrent Neural Network

1 code implementation29 Apr 2019 Sharath Adavanne, Archontis Politis, Tuomas Virtanen

This paper investigates the joint localization, detection, and tracking of sound events using a convolutional recurrent neural network (CRNN).

Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

8 code implementations30 Jun 2018 Sharath Adavanne, Archontis Politis, Joonas Nikunen, Tuomas Virtanen

In this paper, we propose a convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3D) space.

Sound Audio and Speech Processing

Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features

no code implementations29 Jan 2018 Sharath Adavanne, Archontis Politis, Tuomas Virtanen

Each of this dataset has a four-channel first-order Ambisonic, binaural, and single-channel versions, on which the performance of SED using the proposed method are compared to study the potential of SED using multichannel audio.

Event Detection Sound Event Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.