no code implementations • 9 Sep 2024 • Naoyuki Kamo, Naohiro Tawara, Atsushi Ando, Takatomo Kano, Hiroshi Sato, Rintaro Ikeshita, Takafumi Moriya, Shota Horiguchi, Kohei Matsuura, Atsunori Ogawa, Alexis Plaquet, Takanori Ashihara, Tsubasa Ochiai, Masato Mimura, Marc Delcroix, Tomohiro Nakatani, Taichi Asami, Shoko Araki
We present a distant automatic speech recognition (DASR) system developed for the CHiME-8 DASR track.
no code implementations • 30 Aug 2024 • Shota Horiguchi, Atsushi Ando, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato, Naohiro Tawara, Marc Delcroix
This paper proposes a method for extracting speaker embedding for each speaker from a variable-length recording containing multiple speakers.
no code implementations • 1 Jul 2024 • Hiroshi Sato, Takafumi Moriya, Masato Mimura, Shota Horiguchi, Tsubasa Ochiai, Takanori Ashihara, Atsushi Ando, Kentaro Shinayama, Marc Delcroix
Real-time target speaker extraction (TSE) is intended to extract the desired speaker's voice from the observed mixture of multiple speakers in a streaming manner.
no code implementations • 27 Jun 2024 • Atsushi Ando, Takafumi Moriya, Shota Horiguchi, Ryo Masumura
We also propose greedy-then-sampling (GtS) decoding, which first predicts speaking-style factors deterministically to guarantee semantic accuracy, and then generates a caption based on factor-conditioned sampling to ensure diversity.
no code implementations • 11 Feb 2024 • Kenichi Fujita, Atsushi Ando, Yusuke Ijima
This paper proposes a speech rhythm-based method for speaker embeddings to model phoneme duration using a few utterances by the target speaker.
no code implementations • 22 Sep 2023 • Naohiro Tawara, Marc Delcroix, Atsushi Ando, Atsunori Ogawa
This paper details our speaker diarization system designed for multi-domain, multi-microphone casual conversations.
no code implementations • ICCV 2023 • Satoshi Suzuki, Shin'ya Yamaguchi, Shoichiro Takeda, Sekitoshi Kanai, Naoki Makishima, Atsushi Ando, Ryo Masumura
This paper addresses the tradeoff between standard accuracy on clean examples and robustness against adversarial examples in deep neural networks (DNNs).
no code implementations • 4 Jun 2023 • Ryo Masumura, Naoki Makishima, Taiga Yamane, Yoshihiko Yamazaki, Saki Mizuno, Mana Ihori, Mihiro Uchida, Keita Suzuki, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando
Target-speaker ASR systems are a promising way to only transcribe a target speaker's speech by enrolling the target speaker's information.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 28 Oct 2022 • Atsushi Ando, Ryo Masumura, Akihiko Takashima, Satoshi Suzuki, Naoki Makishima, Keita Suzuki, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato
This paper investigates the effectiveness and implementation of modality-specific large-scale pre-trained encoders for multimodal sentiment analysis~(MSA).
no code implementations • WS 2018 • Ryo Masumura, Tomohiro Tanaka, Atsushi Ando, Ryo Ishii, Ryuichiro Higashinaka, Yushi Aono
This paper proposes a fully neural network based dialogue-context online end-of-turn detection method that can utilize long-range interactive information extracted from both speaker{'}s utterances and collocutor{'}s utterances.