Search Results for author: Kentaro Tachibana

Found 14 papers, 4 papers with code

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

1 code implementation12 Jun 2024 Masaya Kawamura, Ryuichi Yamamoto, Yuma Shirahata, Takuya Hasumi, Kentaro Tachibana

We employ a hybrid approach to construct prompt annotations: (1) manual annotations that capture human perceptions of speaker characteristics and (2) synthetic annotations on speaking style.

text-to-speech Text to Speech

Audio-conditioned phonemic and prosodic annotation for building text-to-speech models from unlabeled speech data

no code implementations12 Jun 2024 Yuma Shirahata, Byeongseon Park, Ryuichi Yamamoto, Kentaro Tachibana

For creating a TTS dataset that consists of label-speech paired data, the proposed annotation model leverages an automatic speech recognition (ASR) model to obtain phonemic and prosodic labels from unlabeled speech samples.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions

no code implementations15 Sep 2023 Reo Shimizu, Ryuichi Yamamoto, Masaya Kawamura, Yuma Shirahata, Hironori Doi, Tatsuya Komatsu, Kentaro Tachibana

We propose PromptTTS++, a prompt-based text-to-speech (TTS) synthesis system that allows control over speaker identity using natural language descriptions.

text-to-speech Text to Speech

CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center

no code implementations23 May 2023 Yuki Saito, Eiji Iimori, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari

We present CALLS, a Japanese speech corpus that considers phone calls in a customer center as a new domain of empathetic spoken dialogue.

Speech Synthesis

ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings

no code implementations23 May 2023 Yuki Saito, Shinnosuke Takamichi, Eiji Iimori, Kentaro Tachibana, Hiroshi Saruwatari

We focus on ChatGPT's reading comprehension and introduce it to EDSS, a task of synthesizing speech that can empathize with the interlocutor's emotion.

Chatbot Reading Comprehension +2

Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis

no code implementations28 Oct 2022 Yuma Shirahata, Ryuichi Yamamoto, Eunwoo Song, Ryo Terashima, Jae-Min Kim, Kentaro Tachibana

From these features, the proposed periodicity generator produces a sample-level sinusoidal source that enables the waveform decoder to accurately reproduce the pitch.

Decoder Diversity +4

Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History

no code implementations16 Jun 2022 Yuto Nishimura, Yuki Saito, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari

To train the empathetic DSS model effectively, we investigate 1) a self-supervised learning model pretrained with large speech corpora, 2) a style-guided training using a prosody embedding of the current utterance to be predicted by the dialogue context embedding, 3) a cross-modal attention to combine text and speech modalities, and 4) a sentence-wise embedding to achieve fine-grained prosody modeling rather than utterance-wise modeling.

Self-Supervised Learning Sentence +2

Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation

no code implementations21 Apr 2022 Ryo Terashima, Ryuichi Yamamoto, Eunwoo Song, Yuma Shirahata, Hyun-Wook Yoon, Jae-Min Kim, Kentaro Tachibana

Because pitch-shift data augmentation enables the coverage of a variety of pitch dynamics, it greatly stabilizes training for both VC and TTS models, even when only 1, 000 utterances of the target speaker's neutral data are available.

Data Augmentation text-to-speech +2

STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent

no code implementations28 Mar 2022 Yuki Saito, Yuto Nishimura, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari

We describe our methodology to construct an empathetic dialogue speech corpus and report the analysis results of the STUDIES corpus.

text-to-speech Text to Speech

Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis

1 code implementation26 Apr 2021 Kosuke Futamata, Byeongseon Park, Ryuichi Yamamoto, Kentaro Tachibana

We propose a novel phrase break prediction method that combines implicit features extracted from a pre-trained large language model, a. k. a BERT, and explicit features extracted from BiLSTM with linguistic features.

Language Modeling Language Modelling +6

Full-body High-resolution Anime Generation with Progressive Structure-conditional Generative Adversarial Networks

no code implementations6 Sep 2018 Koichi Hamada, Kentaro Tachibana, Tianqi Li, Hiroto Honda, Yusuke Uchida

Our method tackles the limitations by progressively increasing the resolution of both generated images and structural conditions during training.

Unity Video Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.