no code implementations • 9 Sep 2024 • Naoyuki Kamo, Naohiro Tawara, Atsushi Ando, Takatomo Kano, Hiroshi Sato, Rintaro Ikeshita, Takafumi Moriya, Shota Horiguchi, Kohei Matsuura, Atsunori Ogawa, Alexis Plaquet, Takanori Ashihara, Tsubasa Ochiai, Masato Mimura, Marc Delcroix, Tomohiro Nakatani, Taichi Asami, Shoko Araki
We present a distant automatic speech recognition (DASR) system developed for the CHiME-8 DASR track.
no code implementations • 23 Apr 2024 • Tsubasa Ochiai, Kazuma Iwamoto, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri
To this end, we propose a novel analysis scheme based on the orthogonal projection-based decomposition of SE errors.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 20 Nov 2023 • Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri
Jointly training a speech enhancement (SE) front-end and an automatic speech recognition (ASR) back-end has been investigated as a way to mitigate the influence of \emph{processing distortion} generated by single-channel SE on ASR.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 20 Nov 2023 • Hanako Segawa, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Rintaro Ikeshita, Shoko Araki, Takeshi Yamada, Shoji Makino
However, this training objective may not be optimal for a specific array processing back-end, such as beamforming.
no code implementations • 2 Feb 2022 • Rintaro Ikeshita, Tomohiro Nakatani
Although the time complexity per iteration of ISS is $m$ times smaller than that of IP, the conventional ISS converges slower than the current fastest IP (called $\text{IP}_2$) that updates two rows of $W$ in each iteration.
no code implementations • 18 Jan 2022 • Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri
The artifact component is defined as the SE error signal that cannot be represented as a linear combination of speech and noise sources.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 20 Nov 2021 • Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Naoyuki Kamo, Shoko Araki
This paper develops a framework that can perform denoising, dereverberation, and source separation accurately by using a relatively small number of microphones.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 4 Aug 2021 • Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Shoko Araki
This paper proposes an approach for optimizing a Convolutional BeamFormer (CBF) that can jointly perform denoising (DN), dereverberation (DR), and source separation (SS).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 9 Feb 2021 • Rintaro Ikeshita, Tomohiro Nakatani
We address a blind source separation (BSS) problem in a noisy reverberant environment in which the number of microphones $M$ is greater than the number of sources of interest, and the other noise components can be approximated as stationary and Gaussian distributed.
no code implementations • 21 Jan 2021 • Nobutaka Ito, Rintaro Ikeshita, Hiroshi Sawada, Tomohiro Nakatani
Based on this approach, we present FastFCA, a computationally efficient extension of FCA.
Audio Source Separation Sound Audio and Speech Processing
no code implementations • 12 Jan 2021 • Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki
Developing microphone array technologies for a small number of microphones is important due to the constraints of many devices.
no code implementations • 18 Oct 2020 • Rintaro Ikeshita, Tomohiro Nakatani, Shoko Araki
We also newly develop a BCD for a semiblind IVE in which the transfer functions for several super-Gaussian sources are given a priori.