We proposed two methods to determine a suitable cost function to retrieve the OOV words based on the context.
2 code implementations • 4 Jun 2022 • Archontis Politis, Kazuki Shimada, Parthasaarathy Sudarsanam, Sharath Adavanne, Daniel Krause, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji, Tuomas Virtanen
Additionally, the report presents the baseline system that accompanies the dataset in the challenge with emphasis on the differences with the baseline of the previous iterations; namely, introduction of the multi-ACCDOA representation to handle multiple simultaneous occurences of events of the same class, and support for additional improved input features for the microphone array format.
Ranked #1 on Sound Event Localization and Detection on STARSS22
Data-based and learning-based sound source localization (SSL) has shown promising results in challenging conditions, and is commonly set as a classification or a regression problem.
Due to the inconsistency between the non-native English pronunciation in the audio and native English lexicon in the text, the intelligibility of synthesized speech in such TTS systems is significantly reduced.
This report presents the dataset and baseline of Task 3 of the DCASE2021 Challenge on Sound Event Localization and Detection (SELD).
A large-scale realistic dataset of spatialized sound events was generated for the challenge, to be used for training of learning-based approaches, and for evaluation of the submissions in an unlabeled subset.
This report presents the dataset and the evaluation setup of the Sound Event Localization & Detection (SELD) task for the DCASE 2020 Challenge.
This paper presents the sound event localization and detection (SELD) task setup for the DCASE 2019 challenge.
Sound Audio and Speech Processing
This paper investigates the joint localization, detection, and tracking of sound events using a convolutional recurrent neural network (CRNN).
In this paper, we propose a convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3D) space.
Sound Audio and Speech Processing
Each of this dataset has a four-channel first-order Ambisonic, binaural, and single-channel versions, on which the performance of SED using the proposed method are compared to study the potential of SED using multichannel audio.
This paper proposes a deep neural network for estimating the directions of arrival (DOA) of multiple sound sources.
The encoder is a multi-layered, bi-directional gated recurrent unit (GRU) and the decoder a multi-layered GRU with a classification layer connected to the last GRU of the decoder.
This paper studies the emotion recognition from musical tracks in the 2-dimensional valence-arousal (V-A) emotional space.
In this paper, we propose the use of spatial and harmonic features in combination with long short term memory (LSTM) recurrent neural network (RNN) for automatic sound event detection (SED) task.
This paper studies the detection of bird calls in audio segments using stacked convolutional and recurrent neural networks.
This paper proposes to use low-level spatial features extracted from multichannel audio for sound event detection.
Bird sounds possess distinctive spectral structure which may exhibit small shifts in spectrum depending on the bird species and environmental conditions.