Search Results for author: Ryuichi Yamamoto

Found 16 papers, 10 papers with code

Probability density distillation with generative adversarial networks for high-quality parallel waveform generation

1 code implementation9 Apr 2019 Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim

As this process encourages the student to model the distribution of realistic speech waveform, the perceptual quality of the synthesized speech becomes much more natural.

ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit

3 code implementations24 Oct 2019 Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Yu Zhang, Xu Tan

Furthermore, the unified design enables the integration of ASR functions with TTS, e. g., ASR-based objective evaluation and semi-supervised learning with both ASR and TTS models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

12 code implementations25 Oct 2019 Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim

We propose Parallel WaveGAN, a distillation-free, fast, and small-footprint waveform generation method using a generative adversarial network.

Generative Adversarial Network Speech Synthesis +1

Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network

1 code implementation31 Jan 2020 Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, Hong-Goo Kang

In this paper, we propose an improved LPCNet vocoder using a linear prediction (LP)-structured mixture density network (MDN).

Quantization Speech Synthesis

Parallel waveform synthesis based on generative adversarial networks with voicing-aware conditional discriminators

no code implementations27 Oct 2020 Ryuichi Yamamoto, Eunwoo Song, Min-Jae Hwang, Jae-Min Kim

This paper proposes voicing-aware conditional discriminators for Parallel WaveGAN-based waveform synthesis systems.

Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis

1 code implementation26 Apr 2021 Kosuke Futamata, Byeongseon Park, Ryuichi Yamamoto, Kentaro Tachibana

We propose a novel phrase break prediction method that combines implicit features extracted from a pre-trained large language model, a. k. a BERT, and explicit features extracted from BiLSTM with linguistic features.

Language Modelling Large Language Model +3

Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation

no code implementations21 Apr 2022 Ryo Terashima, Ryuichi Yamamoto, Eunwoo Song, Yuma Shirahata, Hyun-Wook Yoon, Jae-Min Kim, Kentaro Tachibana

Because pitch-shift data augmentation enables the coverage of a variety of pitch dynamics, it greatly stabilizes training for both VC and TTS models, even when only 1, 000 utterances of the target speaker's neutral data are available.

Data Augmentation Voice Conversion

TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder

no code implementations30 Jun 2022 Eunwoo Song, Ryuichi Yamamoto, Ohsung Kwon, Chan-Ho Song, Min-Jae Hwang, Suhyeon Oh, Hyun-Wook Yoon, Jin-Seob Kim, Jae-Min Kim

In the proposed method, we first adopt a variational autoencoder whose posterior distribution is utilized to extract latent features representing acoustic similarity between the recorded and synthetic corpora.

Speech Synthesis

NNSVS: A Neural Network-Based Singing Voice Synthesis Toolkit

2 code implementations28 Oct 2022 Ryuichi Yamamoto, Reo Yoneyama, Tomoki Toda

This paper describes the design of NNSVS, an open-source software for neural network-based singing voice synthesis research.

Singing Voice Synthesis

Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis

no code implementations28 Oct 2022 Yuma Shirahata, Ryuichi Yamamoto, Eunwoo Song, Ryo Terashima, Jae-Min Kim, Kentaro Tachibana

From these features, the proposed periodicity generator produces a sample-level sinusoidal source that enables the waveform decoder to accurately reproduce the pitch.

Emotional Speech Synthesis Variational Inference

PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions

no code implementations15 Sep 2023 Reo Shimizu, Ryuichi Yamamoto, Masaya Kawamura, Yuma Shirahata, Hironori Doi, Tatsuya Komatsu, Kentaro Tachibana

We propose PromptTTS++, a prompt-based text-to-speech (TTS) synthesis system that allows control over speaker identity using natural language descriptions.

Cannot find the paper you are looking for? You can Submit a new open access paper.