Search Results for author: Ryuichi Yamamoto

Found 8 papers, 6 papers with code

Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis

no code implementations26 Apr 2021 Kosuke Futamata, Byeongseon Park, Ryuichi Yamamoto, Kentaro Tachibana

We propose a novel phrase break prediction method that combines implicit features extracted from a pre-trained large language model, a. k. a BERT, and explicit features extracted from BiLSTM with linguistic features.

Language Modelling Speech Synthesis +1

Parallel waveform synthesis based on generative adversarial networks with voicing-aware conditional discriminators

no code implementations27 Oct 2020 Ryuichi Yamamoto, Eunwoo Song, Min-Jae Hwang, Jae-Min Kim

This paper proposes voicing-aware conditional discriminators for Parallel WaveGAN-based waveform synthesis systems.

Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network

1 code implementation31 Jan 2020 Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, Hong-Goo Kang

In this paper, we propose an improved LPCNet vocoder using a linear prediction (LP)-structured mixture density network (MDN).

Quantization Speech Synthesis

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

11 code implementations25 Oct 2019 Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim

We propose Parallel WaveGAN, a distillation-free, fast, and small-footprint waveform generation method using a generative adversarial network.

Speech Synthesis Text-To-Speech Synthesis

ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit

3 code implementations24 Oct 2019 Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Yu Zhang, Xu Tan

Furthermore, the unified design enables the integration of ASR functions with TTS, e. g., ASR-based objective evaluation and semi-supervised learning with both ASR and TTS models.

Speech Recognition

Probability density distillation with generative adversarial networks for high-quality parallel waveform generation

1 code implementation9 Apr 2019 Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim

As this process encourages the student to model the distribution of realistic speech waveform, the perceptual quality of the synthesized speech becomes much more natural.

Cannot find the paper you are looking for? You can Submit a new open access paper.