Search Results for author: Yusuke Yasuda

Found 13 papers, 4 papers with code

ESPnet2-TTS: Extending the Edge of TTS Research

1 code implementation • 15 Oct 2021 • Tomoki Hayashi, Ryuichi Yamamoto, Takenori Yoshimura, Peter Wu, Jiatong Shi, Takaaki Saeki, Yooncheol Ju, Yusuke Yasuda, Shinnosuke Takamichi, Shinji Watanabe

This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS) toolkit.

7,875

Paper
Code

Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings

3 code implementations • 23 Oct 2019 • Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Fuming Fang, Xin Wang, Nanxin Chen, Junichi Yamagishi

While speaker adaptation for end-to-end speech synthesis using speaker embeddings can produce good speaker similarity for speakers seen during training, there remains a gap for zero-shot adaptation to unseen speakers.

Audio and Speech Processing

264

Paper
Code

Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?

1 code implementation • 4 May 2020 • Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Junichi Yamagishi

This is followed by an analysis on synthesis quality, speaker and dialect similarity, and a remark on the effectiveness of our speaker augmentation approach.

Speech Synthesis

264

Paper
Code

Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language

1 code implementation • 29 Oct 2018 • Yusuke Yasuda, Xin Wang, Shinji Takaki, Junichi Yamagishi

Towards end-to-end Japanese speech synthesis, we extend Tacotron to systems with self-attention to capture long-term dependencies related to pitch accents and compare their audio quality with classical pipeline systems under various conditions to show their pros and cons.

Speech Synthesis Text-To-Speech Synthesis

113

Paper
Code

Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments

no code implementations • 30 Aug 2019 • Yusuke Yasuda, Xin Wang, Junichi Yamagishi

The advantages of our approach are that we can simplify many modules for the soft attention and that we can train the end-to-end TTS model using a single likelihood function.

Paper
Add Code

Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment

no code implementations • 28 Oct 2019 • Yusuke Yasuda, Xin Wang, Junichi Yamagishi

Sequence-to-sequence text-to-speech (TTS) is dominated by soft-attention-based methods.

Hard Attention Speech Synthesis +1

Paper
Add Code

Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis

no code implementations • 20 May 2020 • Yusuke Yasuda, Xin Wang, Junichi Yamagishi

Our experiments suggest that a) a neural sequence-to-sequence TTS system should have a sufficient number of model parameters to produce high quality speech, b) it should also use a powerful encoder when it takes characters as inputs, and c) the encoder still has a room for improvement and needs to have an improved architecture to learn supra-segmental features more appropriately.

Speech Synthesis Text-To-Speech Synthesis

Paper
Add Code

End-to-End Text-to-Speech using Latent Duration based on VQ-VAE

no code implementations • 19 Oct 2020 • Yusuke Yasuda, Xin Wang, Junichi Yamagishi

Explicit duration modeling is a key to achieving robust and efficient alignment in text-to-speech synthesis (TTS).

Speech Synthesis Text-To-Speech Synthesis

Paper
Add Code

Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis

no code implementations • 10 Nov 2020 • Erica Cooper, Xin Wang, Yi Zhao, Yusuke Yasuda, Junichi Yamagishi

We explore pretraining strategies including choice of base corpus with the aim of choosing the best strategy for zero-shot multi-speaker end-to-end synthesis.

Speech Synthesis

Paper
Add Code

Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder

no code implementations • 16 Dec 2022 • Yusuke Yasuda, Tomoki Toda

We propose a TTS method based on latent variable conversion using a diffusion probabilistic model and the variational autoencoder (VAE).

Representation Learning Speech Synthesis +1

Paper
Add Code

Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language

no code implementations • 16 Dec 2022 • Yusuke Yasuda, Tomoki Toda

To tackle the challenge of rendering correct pitch accent in Japanese end-to-end TTS, we adopt PnG~BERT, a self-supervised pretrained model in the character and phoneme domain for TTS.

Language Modelling Speech Synthesis +1

Paper
Add Code

Preference-based training framework for automatic speech quality assessment using deep neural network

no code implementations • 29 Aug 2023 • Cheng-Hung Hu, Yusuke Yasuda, Tomoki Toda

We propose a training framework of SQA models that can be trained with only preference scores derived from pairs of MOS to improve ranking prediction.

Paper
Add Code

Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment

no code implementations • 10 Mar 2024 • Yusuke Yasuda, Tomoki Toda

A preference-based subjective evaluation is a key method for evaluating generative media reliably.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.