Search Results for author: Atsushi Ando

Found 6 papers, 1 papers with code

Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis

no code implementations • 11 Feb 2024 • Kenichi Fujita, Atsushi Ando, Yusuke Ijima

This paper proposes a speech rhythm-based method for speaker embeddings to model phoneme duration using a few utterances by the target speaker.

Speaker Identification Speech Synthesis

Paper
Add Code

NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization

no code implementations • 22 Sep 2023 • Naohiro Tawara, Marc Delcroix, Atsushi Ando, Atsunori Ogawa

This paper details our speaker diarization system designed for multi-domain, multi-microphone casual conversations.

Automatic Speech Recognition speaker-diarization +3

Paper
Add Code

Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff

no code implementations • ICCV 2023 • Satoshi Suzuki, Shin'ya Yamaguchi, Shoichiro Takeda, Sekitoshi Kanai, Naoki Makishima, Atsushi Ando, Ryo Masumura

This paper addresses the tradeoff between standard accuracy on clean examples and robustness against adversarial examples in deep neural networks (DNNs).

Knowledge Distillation

Paper
Add Code

End-to-End Joint Target and Non-Target Speakers ASR

no code implementations • 4 Jun 2023 • Ryo Masumura, Naoki Makishima, Taiga Yamane, Yoshihiko Yamazaki, Saki Mizuno, Mana Ihori, Mihiro Uchida, Keita Suzuki, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando

Target-speaker ASR systems are a promising way to only transcribe a target speaker's speech by enrolling the target speaker's information.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis

1 code implementation • 28 Oct 2022 • Atsushi Ando, Ryo Masumura, Akihiko Takashima, Satoshi Suzuki, Naoki Makishima, Keita Suzuki, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato

This paper investigates the effectiveness and implementation of modality-specific large-scale pre-trained encoders for multimodal sentiment analysis~(MSA).

Multimodal Sentiment Analysis

Paper
Code

Neural Dialogue Context Online End-of-Turn Detection

no code implementations • WS 2018 • Ryo Masumura, Tomohiro Tanaka, Atsushi Ando, Ryo Ishii, Ryuichiro Higashinaka, Yushi Aono

This paper proposes a fully neural network based dialogue-context online end-of-turn detection method that can utilize long-range interactive information extracted from both speaker{'}s utterances and collocutor{'}s utterances.

Action Detection Spoken Dialogue Systems

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.