no code implementations • 30 Sep 2024 • Takafumi Moriya, Shota Horiguchi, Marc Delcroix, Ryo Masumura, Takanori Ashihara, Hiroshi Sato, Kohei Matsuura, Masato Mimura
Thus, MT-RNNT-AFT can be trained without relying on accurate alignments, and it can recognize all speakers' speech with just one round of encoder processing.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 30 Sep 2024 • Takafumi Moriya, Takanori Ashihara, Masato Mimura, Hiroshi Sato, Kohei Matsuura, Ryo Masumura, Taichi Asami
A hybrid autoregressive transducer (HAT) is a variant of neural transducer that models blank and non-blank posterior distributions separately.
no code implementations • 9 Sep 2024 • Naoyuki Kamo, Naohiro Tawara, Atsushi Ando, Takatomo Kano, Hiroshi Sato, Rintaro Ikeshita, Takafumi Moriya, Shota Horiguchi, Kohei Matsuura, Atsunori Ogawa, Alexis Plaquet, Takanori Ashihara, Tsubasa Ochiai, Masato Mimura, Marc Delcroix, Tomohiro Nakatani, Taichi Asami, Shoko Araki
We present a distant automatic speech recognition (DASR) system developed for the CHiME-8 DASR track.
no code implementations • 1 Aug 2024 • Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Masato Mimura, Takatomo Kano, Atsunori Ogawa, Marc Delcroix
Using these datasets, our study evaluates two types of Transformer-based models: 1) cascade models that combine ASR and strong text summarization models, and 2) end-to-end (E2E) models that directly convert speech into a text summary.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 27 Jun 2024 • Atsunori Ogawa, Naoyuki Kamo, Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Takatomo Kano, Naohiro Tawara, Marc Delcroix
We investigate the effects of domain adaptation of the LLM and context carry-over when performing N-best rescoring.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 31 Jan 2024 • Takanori Ashihara, Marc Delcroix, Takafumi Moriya, Kohei Matsuura, Taichi Asami, Yusuke Ijima
Our analysis unveils that 1) the capacity to represent content information is somewhat unrelated to enhanced speaker representation, 2) specific layers of speech SSL models would be partly specialized in capturing linguistic information, and 3) speaker SSL models tend to disregard linguistic information but exhibit more sophisticated speaker representation.
1 code implementation • 14 Jun 2023 • Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, Yusuke Ijima, Taichi Asami, Marc Delcroix, Yukinori Honma
Self-supervised learning (SSL) for speech representation has been successfully applied in various downstream tasks, such as speech and speaker recognition.
no code implementations • 7 Jun 2023 • Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Takatomo Kano, Atsunori Ogawa, Marc Delcroix
End-to-end speech summarization (E2E SSum) directly summarizes input speech into easy-to-read short sentences with a single model.
no code implementations • 25 May 2023 • Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa, Taichi Asami
Neural transducer (RNNT)-based target-speaker speech recognition (TS-RNNT) directly transcribes a target speaker's voice from a multi-talker mixture.
no code implementations • 25 May 2023 • Takafumi Moriya, Takanori Ashihara, Hiroshi Sato, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura
Experiments in three datasets confirm that RNNT trained with our SS approach achieves the best ASR performance.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 9 May 2023 • Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka
However, since the two settings have been studied individually in general, there has been little research focusing on how effective a cross-lingual model is in comparison with a monolingual model.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 2 Mar 2023 • Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Atsunori Ogawa, Marc Delcroix, Ryo Masumura
The first technique is to utilize a text-to-speech (TTS) system to generate synthesized speech, which is used for E2E SSum training with the text summary.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 14 Jul 2022 • Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka
We investigate the performance on SUPERB while varying the structure and KD methods so as to keep the number of parameters constant; this allows us to analyze the contribution of the representation introduced by varying the model architecture.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • 19 May 2020 • Kohei Matsuura, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
We evaluated this speaker adaptation approach on two low-resource corpora, namely, Ainu and Mboshi.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • LREC 2020 • Kohei Matsuura, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
Ainu is an unwritten language that has been spoken by Ainu people who are one of the ethnic groups in Japan.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2