Search Results for author: Yusuke Yasuda

Found 9 papers, 4 papers with code

Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis

no code implementations10 Nov 2020 Erica Cooper, Xin Wang, Yi Zhao, Yusuke Yasuda, Junichi Yamagishi

We explore pretraining strategies including choice of base corpus with the aim of choosing the best strategy for zero-shot multi-speaker end-to-end synthesis.

Speech Synthesis

End-to-End Text-to-Speech using Latent Duration based on VQ-VAE

no code implementations19 Oct 2020 Yusuke Yasuda, Xin Wang, Junichi Yamagishi

Explicit duration modeling is a key to achieving robust and efficient alignment in text-to-speech synthesis (TTS).

Speech Synthesis Text-To-Speech Synthesis

Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis

no code implementations20 May 2020 Yusuke Yasuda, Xin Wang, Junichi Yamagishi

Our experiments suggest that a) a neural sequence-to-sequence TTS system should have a sufficient number of model parameters to produce high quality speech, b) it should also use a powerful encoder when it takes characters as inputs, and c) the encoder still has a room for improvement and needs to have an improved architecture to learn supra-segmental features more appropriately.

Speech Synthesis Text-To-Speech Synthesis

Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?

1 code implementation4 May 2020 Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Junichi Yamagishi

This is followed by an analysis on synthesis quality, speaker and dialect similarity, and a remark on the effectiveness of our speaker augmentation approach.

Speech Synthesis

Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings

3 code implementations23 Oct 2019 Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Fuming Fang, Xin Wang, Nanxin Chen, Junichi Yamagishi

While speaker adaptation for end-to-end speech synthesis using speaker embeddings can produce good speaker similarity for speakers seen during training, there remains a gap for zero-shot adaptation to unseen speakers.

Audio and Speech Processing

Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments

no code implementations30 Aug 2019 Yusuke Yasuda, Xin Wang, Junichi Yamagishi

The advantages of our approach are that we can simplify many modules for the soft attention and that we can train the end-to-end TTS model using a single likelihood function.

Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language

1 code implementation29 Oct 2018 Yusuke Yasuda, Xin Wang, Shinji Takaki, Junichi Yamagishi

Towards end-to-end Japanese speech synthesis, we extend Tacotron to systems with self-attention to capture long-term dependencies related to pitch accents and compare their audio quality with classical pipeline systems under various conditions to show their pros and cons.

Speech Synthesis Text-To-Speech Synthesis

Cannot find the paper you are looking for? You can Submit a new open access paper.