Search Results for author: Yusuke Yasuda

Found 13 papers, 4 papers with code

Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment

no code implementations10 Mar 2024 Yusuke Yasuda, Tomoki Toda

A preference-based subjective evaluation is a key method for evaluating generative media reliably.

Preference-based training framework for automatic speech quality assessment using deep neural network

no code implementations29 Aug 2023 Cheng-Hung Hu, Yusuke Yasuda, Tomoki Toda

We propose a training framework of SQA models that can be trained with only preference scores derived from pairs of MOS to improve ranking prediction.

Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language

no code implementations16 Dec 2022 Yusuke Yasuda, Tomoki Toda

To tackle the challenge of rendering correct pitch accent in Japanese end-to-end TTS, we adopt PnG~BERT, a self-supervised pretrained model in the character and phoneme domain for TTS.

Language Modelling Speech Synthesis +1

Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder

no code implementations16 Dec 2022 Yusuke Yasuda, Tomoki Toda

We propose a TTS method based on latent variable conversion using a diffusion probabilistic model and the variational autoencoder (VAE).

Representation Learning Speech Synthesis +1

Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis

no code implementations10 Nov 2020 Erica Cooper, Xin Wang, Yi Zhao, Yusuke Yasuda, Junichi Yamagishi

We explore pretraining strategies including choice of base corpus with the aim of choosing the best strategy for zero-shot multi-speaker end-to-end synthesis.

Speech Synthesis

End-to-End Text-to-Speech using Latent Duration based on VQ-VAE

no code implementations19 Oct 2020 Yusuke Yasuda, Xin Wang, Junichi Yamagishi

Explicit duration modeling is a key to achieving robust and efficient alignment in text-to-speech synthesis (TTS).

Speech Synthesis Text-To-Speech Synthesis

Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis

no code implementations20 May 2020 Yusuke Yasuda, Xin Wang, Junichi Yamagishi

Our experiments suggest that a) a neural sequence-to-sequence TTS system should have a sufficient number of model parameters to produce high quality speech, b) it should also use a powerful encoder when it takes characters as inputs, and c) the encoder still has a room for improvement and needs to have an improved architecture to learn supra-segmental features more appropriately.

Speech Synthesis Text-To-Speech Synthesis

Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?

1 code implementation4 May 2020 Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Junichi Yamagishi

This is followed by an analysis on synthesis quality, speaker and dialect similarity, and a remark on the effectiveness of our speaker augmentation approach.

Speech Synthesis

Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings

3 code implementations23 Oct 2019 Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Fuming Fang, Xin Wang, Nanxin Chen, Junichi Yamagishi

While speaker adaptation for end-to-end speech synthesis using speaker embeddings can produce good speaker similarity for speakers seen during training, there remains a gap for zero-shot adaptation to unseen speakers.

Audio and Speech Processing

Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments

no code implementations30 Aug 2019 Yusuke Yasuda, Xin Wang, Junichi Yamagishi

The advantages of our approach are that we can simplify many modules for the soft attention and that we can train the end-to-end TTS model using a single likelihood function.


Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language

1 code implementation29 Oct 2018 Yusuke Yasuda, Xin Wang, Shinji Takaki, Junichi Yamagishi

Towards end-to-end Japanese speech synthesis, we extend Tacotron to systems with self-attention to capture long-term dependencies related to pitch accents and compare their audio quality with classical pipeline systems under various conditions to show their pros and cons.

Speech Synthesis Text-To-Speech Synthesis

Cannot find the paper you are looking for? You can Submit a new open access paper.