Search Results for author: Atsushi Ando

Found 10 papers, 1 papers with code

SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling

no code implementations1 Jul 2024 Hiroshi Sato, Takafumi Moriya, Masato Mimura, Shota Horiguchi, Tsubasa Ochiai, Takanori Ashihara, Atsushi Ando, Kentaro Shinayama, Marc Delcroix

Real-time target speaker extraction (TSE) is intended to extract the desired speaker's voice from the observed mixture of multiple speakers in a streaming manner.

Target Speaker Extraction

Factor-Conditioned Speaking-Style Captioning

no code implementations27 Jun 2024 Atsushi Ando, Takafumi Moriya, Shota Horiguchi, Ryo Masumura

We also propose greedy-then-sampling (GtS) decoding, which first predicts speaking-style factors deterministically to guarantee semantic accuracy, and then generates a caption based on factor-conditioned sampling to ensure diversity.

Diversity

Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis

no code implementations11 Feb 2024 Kenichi Fujita, Atsushi Ando, Yusuke Ijima

This paper proposes a speech rhythm-based method for speaker embeddings to model phoneme duration using a few utterances by the target speaker.

Speaker Identification Speech Synthesis

On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis

1 code implementation28 Oct 2022 Atsushi Ando, Ryo Masumura, Akihiko Takashima, Satoshi Suzuki, Naoki Makishima, Keita Suzuki, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato

This paper investigates the effectiveness and implementation of modality-specific large-scale pre-trained encoders for multimodal sentiment analysis~(MSA).

Multimodal Sentiment Analysis

Neural Dialogue Context Online End-of-Turn Detection

no code implementations WS 2018 Ryo Masumura, Tomohiro Tanaka, Atsushi Ando, Ryo Ishii, Ryuichiro Higashinaka, Yushi Aono

This paper proposes a fully neural network based dialogue-context online end-of-turn detection method that can utilize long-range interactive information extracted from both speaker{'}s utterances and collocutor{'}s utterances.

Action Detection Spoken Dialogue Systems

Cannot find the paper you are looking for? You can Submit a new open access paper.