no code implementations • 11 Feb 2024 • Kenichi Fujita, Atsushi Ando, Yusuke Ijima
This paper proposes a speech rhythm-based method for speaker embeddings to model phoneme duration using a few utterances by the target speaker.
no code implementations • 22 Sep 2023 • Naohiro Tawara, Marc Delcroix, Atsushi Ando, Atsunori Ogawa
This paper details our speaker diarization system designed for multi-domain, multi-microphone casual conversations.
no code implementations • ICCV 2023 • Satoshi Suzuki, Shin'ya Yamaguchi, Shoichiro Takeda, Sekitoshi Kanai, Naoki Makishima, Atsushi Ando, Ryo Masumura
This paper addresses the tradeoff between standard accuracy on clean examples and robustness against adversarial examples in deep neural networks (DNNs).
no code implementations • 4 Jun 2023 • Ryo Masumura, Naoki Makishima, Taiga Yamane, Yoshihiko Yamazaki, Saki Mizuno, Mana Ihori, Mihiro Uchida, Keita Suzuki, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando
Target-speaker ASR systems are a promising way to only transcribe a target speaker's speech by enrolling the target speaker's information.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 28 Oct 2022 • Atsushi Ando, Ryo Masumura, Akihiko Takashima, Satoshi Suzuki, Naoki Makishima, Keita Suzuki, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato
This paper investigates the effectiveness and implementation of modality-specific large-scale pre-trained encoders for multimodal sentiment analysis~(MSA).
no code implementations • WS 2018 • Ryo Masumura, Tomohiro Tanaka, Atsushi Ando, Ryo Ishii, Ryuichiro Higashinaka, Yushi Aono
This paper proposes a fully neural network based dialogue-context online end-of-turn detection method that can utilize long-range interactive information extracted from both speaker{'}s utterances and collocutor{'}s utterances.