no code implementations • 9 Sep 2024 • Naoyuki Kamo, Naohiro Tawara, Atsushi Ando, Takatomo Kano, Hiroshi Sato, Rintaro Ikeshita, Takafumi Moriya, Shota Horiguchi, Kohei Matsuura, Atsunori Ogawa, Alexis Plaquet, Takanori Ashihara, Tsubasa Ochiai, Masato Mimura, Marc Delcroix, Tomohiro Nakatani, Taichi Asami, Shoko Araki
We present a distant automatic speech recognition (DASR) system developed for the CHiME-8 DASR track.
no code implementations • 5 Feb 2024 • Marvin Tammen, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Shoko Araki, Simon Doclo
Although mask-based beamforming is a powerful speech enhancement approach, it often requires manual parameter tuning to handle moving speakers.
no code implementations • 20 Nov 2023 • Hanako Segawa, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Rintaro Ikeshita, Shoko Araki, Takeshi Yamada, Shoji Makino
However, this training objective may not be optimal for a specific array processing back-end, such as beamforming.
no code implementations • 8 Aug 2023 • Naoyuki Kamo, Marc Delcroix, Tomohiro Nakatani
TSE is realized by conditioning the extraction process on a clue identifying the target speaker.
no code implementations • 29 Jun 2023 • Ning Guo, Tomohiro Nakatani, Shoko Araki, Takehiro Moriya
This paper introduces a novel low-latency online beamforming (BF) algorithm, named Modified Parametric Multichannel Wiener Filter (Mod-PMWF), for enhancing speech mixtures with unknown and varying number of speakers.
no code implementations • 23 May 2023 • Marc Delcroix, Naohiro Tawara, Mireia Diez, Federico Landini, Anna Silnova, Atsunori Ogawa, Tomohiro Nakatani, Lukas Burget, Shoko Araki
Combining end-to-end neural speaker diarization (EEND) with vector clustering (VC), known as EEND-VC, has gained interest for leveraging the strengths of both methods.
no code implementations • 7 May 2022 • Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Shoko Araki
We thus introduce a learning-based framework that computes optimal attention weights for beamforming.
no code implementations • 11 Apr 2022 • Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolikova, Hiroshi Sato, Tomohiro Nakatani
Target speech extraction (TSE) extracts the speech of a target speaker in a mixture given auxiliary clues characterizing the speaker, such as an enrollment utterance.
no code implementations • 2 Feb 2022 • Rintaro Ikeshita, Tomohiro Nakatani
Although the time complexity per iteration of ISS is $m$ times smaller than that of IP, the conventional ISS converges slower than the current fastest IP (called $\text{IP}_2$) that updates two rows of $W$ in each iteration.
no code implementations • 20 Nov 2021 • Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Naoyuki Kamo, Shoko Araki
This paper develops a framework that can perform denoising, dereverberation, and source separation accurately by using a relatively small number of microphones.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 4 Aug 2021 • Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Shoko Araki
This paper proposes an approach for optimizing a Convolutional BeamFormer (CBF) that can jointly perform denoising (DN), dereverberation (DR), and source separation (SS).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 7 Jun 2021 • Christopher Schymura, Benedikt Bönninghoff, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa
Sound event localization aims at estimating the positions of sound sources in the environment with respect to an acoustic receiver (e. g. a microphone array).
no code implementations • 17 Apr 2021 • Ayako Yamamoto, Toshio Irino, Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani
Many subjective experiments have been performed to develop objective speech intelligibility measures, but the novel coronavirus outbreak has made it very difficult to conduct experiments in a laboratory.
1 code implementation • 28 Feb 2021 • Christopher Schymura, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa
Herein, attentions allow for capturing temporal dependencies in the audio signal by focusing on specific frames that are relevant for estimating the activity and direction-of-arrival of sound events at the current time-step.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 23 Feb 2021 • Wangyou Zhang, Christoph Boeddeker, Shinji Watanabe, Tomohiro Nakatani, Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Naoyuki Kamo, Reinhold Haeb-Umbach, Yanmin Qian
Recently, the end-to-end approach has been successfully applied to multi-speaker speech separation and recognition in both single-channel and multichannel conditions.
1 code implementation • 23 Feb 2021 • Julio Wissing, Benedikt Boenninghoff, Dorothea Kolossa, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Christopher Schymura
Estimating the positions of multiple speakers can be helpful for tasks like automatic speech recognition or speaker diarization.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 9 Feb 2021 • Rintaro Ikeshita, Tomohiro Nakatani
We address a blind source separation (BSS) problem in a noisy reverberant environment in which the number of microphones $M$ is greater than the number of sources of interest, and the other noise components can be approximated as stationary and Gaussian distributed.
no code implementations • 2 Feb 2021 • Hiroshi Sato, Tsubasa Ochiai, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Shoko Araki
Recently an audio-visual target speaker extraction has been proposed that extracts target speech by using complementary audio and visual clues.
no code implementations • 21 Jan 2021 • Nobutaka Ito, Rintaro Ikeshita, Hiroshi Sawada, Tomohiro Nakatani
Based on this approach, we present FastFCA, a computationally efficient extension of FCA.
Audio Source Separation Sound Audio and Speech Processing
no code implementations • 14 Jan 2021 • Marc Delcroix, Katerina Zmolikova, Tsubasa Ochiai, Keisuke Kinoshita, Tomohiro Nakatani
Target speech extraction, which extracts the speech of a target speaker in a mixture given auxiliary speaker clues, has recently received increased interest.
no code implementations • 12 Jan 2021 • Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki
Developing microphone array technologies for a small number of microphones is important due to the constraints of many devices.
1 code implementation • 24 Nov 2020 • Katerina Zmolikova, Marc Delcroix, Lukáš Burget, Tomohiro Nakatani, Jan "Honza" Černocký
In this paper, we propose a method combining variational autoencoder model of speech with a spatial clustering approach for multi-channel speech separation.
Audio and Speech Processing
no code implementations • 18 Oct 2020 • Rintaro Ikeshita, Tomohiro Nakatani, Shoko Araki
We also newly develop a BCD for a semiblind IVE in which the transfer functions for several super-Gaussian sources are given a priori.
no code implementations • 4 Jun 2020 • Thilo von Neumann, Christoph Boeddeker, Lukas Drude, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach
Most approaches to multi-talker overlapped speech separation and recognition assume that the number of simultaneously active speakers is given, but in realistic situations, it is typically unknown.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 9 Mar 2020 • Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani
Automatic meeting analysis is an essential fundamental technology required to let, e. g. smart devices follow and respond to our conversations.
no code implementations • 9 Mar 2020 • Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani
With the advent of deep learning, research on noise-robust automatic speech recognition (ASR) has progressed rapidly.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 23 Jan 2020 • Marc Delcroix, Tsubasa Ochiai, Katerina Zmolikova, Keisuke Kinoshita, Naohiro Tawara, Tomohiro Nakatani, Shoko Araki
First, we propose a time-domain implementation of SpeakerBeam similar to that proposed for a time-domain audio separation network (TasNet), which has achieved state-of-the-art performance for speech separation.
no code implementations • 18 Dec 2019 • Thilo von Neumann, Keisuke Kinoshita, Lukas Drude, Christoph Boeddeker, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach
The rising interest in single-channel multi-speaker speech separation sparked development of End-to-End (E2E) approaches to multi-speaker speech recognition.
no code implementations • 30 Oct 2019 • Christoph Boeddeker, Tomohiro Nakatani, Keisuke Kinoshita, Reinhold Haeb-Umbach
We previously proposed an optimal (in the maximum likelihood sense) convolutional beamformer that can perform simultaneous denoising and dereverberation, and showed its superiority over the widely used cascade of a WPE dereverberation filter and a conventional MPDR beamformer.
no code implementations • 21 Feb 2019 • Thilo von Neumann, Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani, Reinhold Haeb-Umbach
While significant progress has been made on individual tasks, this paper presents for the first time an all-neural approach to simultaneous speaker counting, diarization and source separation.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3