2 code implementations • 4 Jun 2022 • Archontis Politis, Kazuki Shimada, Parthasaarathy Sudarsanam, Sharath Adavanne, Daniel Krause, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji, Tuomas Virtanen
Additionally, the report presents the baseline system that accompanies the dataset in the challenge with emphasis on the differences with the baseline of the previous iterations; namely, introduction of the multi-ACCDOA representation to handle multiple simultaneous occurences of events of the same class, and support for additional improved input features for the microphone array format.
Ranked #1 on Sound Event Localization and Detection on STARSS22
1 code implementation • NeurIPS 2023 • Kazuki Shimada, Archontis Politis, Parthasaarathy Sudarsanam, Daniel Krause, Kengo Uchida, Sharath Adavanne, Aapo Hakala, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Tuomas Virtanen, Yuki Mitsufuji
While direction of arrival (DOA) of sound events is generally estimated from multichannel audio data recorded in a microphone array, sound events usually derive from visually perceptible source objects, e. g., sounds of footsteps come from the feet of a walker.
1 code implementation • 29 Nov 2019 • Naoya Takahashi, Mayank Kumar Singh, Sakya Basak, Parthasaarathy Sudarsanam, Sriram Ganapathy, Yuki Mitsufuji
Despite recent advances in voice separation methods, many challenges remain in realistic scenarios such as noisy recording and the limits of available data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 20 Apr 2022 • Samuel Lipping, Parthasaarathy Sudarsanam, Konstantinos Drossos, Tuomas Virtanen
Audio question answering (AQA) is a multimodal translation task where a system analyzes an audio signal and a natural language question, to generate a desirable natural language answer.
no code implementations • 31 May 2023 • Parthasaarathy Sudarsanam, Tuomas Virtanen
On the yes/no binary classification task, our proposed model achieves an accuracy of 68. 3% compared to 62. 7% in the reference model.