1 code implementation • 25 Jun 2024 • Duc-Tuan Truong, Ruijie Tao, Tuan Nguyen, Hieu-Thi Luong, Kong Aik Lee, Eng Siong Chng
Recent synthetic speech detectors leveraging the Transformer model have superior performance compared to the convolutional neural network counterparts.
no code implementations • 11 Oct 2021 • Hieu-Thi Luong, Junichi Yamagishi
Emotional and controllable speech synthesis is a topic that has received much attention.
no code implementations • 25 Jun 2021 • Hieu-Thi Luong, Junichi Yamagishi
Generally speaking, the main objective when training a neural speech synthesis system is to synthesize natural and expressive speech from the output layer of the neural network without much attention given to the hidden layers.
no code implementations • 8 Oct 2020 • Hieu-Thi Luong, Junichi Yamagishi
As the recently proposed voice cloning system, NAUTILUS, is capable of cloning unseen voices using untranscribed speech, we investigate the feasibility of using it to develop a unified cross-lingual TTS/VC system.
no code implementations • 22 May 2020 • Hieu-Thi Luong, Junichi Yamagishi
By using a multi-speaker speech corpus to train all requisite encoders and decoders in the initial training stage, our system can clone unseen voices using untranscribed speech of target speakers on the basis of the backpropagation algorithm.
no code implementations • 14 Sep 2019 • Hieu-Thi Luong, Junichi Yamagishi
Voice conversion (VC) and text-to-speech (TTS) are two tasks that share a similar objective, generating speech with a target voice.
no code implementations • 18 Jun 2019 • Hieu-Thi Luong, Junichi Yamagishi
In this study, we propose a novel speech synthesis model, which can be adapted to unseen speakers by fine-tuning part of or all of the network using either transcribed or untranscribed speech.
no code implementations • 1 Apr 2019 • Hieu-Thi Luong, Xin Wang, Junichi Yamagishi, Nobuyuki Nishizawa
When the available data of a target speaker is insufficient to train a high quality speaker-dependent neural text-to-speech (TTS) system, we can combine data from multiple speakers and train a multi-speaker TTS model instead.
no code implementations • 20 Aug 2018 • Hieu-Thi Luong, Junichi Yamagishi
Two new training schemes for the new architecture are also proposed in this paper.
no code implementations • 2 Aug 2018 • Hieu-Thi Luong, Xin Wang, Junichi Yamagishi, Nobuyuki Nishizawa
We investigated the impact of noisy linguistic features on the performance of a Japanese speech synthesis system based on neural network that uses WaveNet vocoder.
no code implementations • 31 Jul 2018 • Yi Zhao, Shinji Takaki, Hieu-Thi Luong, Junichi Yamagishi, Daisuke Saito, Nobuaki Minematsu
In order to reduce the mismatched characteristics between natural and generated acoustic features, we propose frameworks that incorporate either a conditional generative adversarial network (GAN) or its variant, Wasserstein GAN with gradient penalty (WGAN-GP), into multi-speaker speech synthesis that uses the WaveNet vocoder.
no code implementations • 31 Jul 2018 • Hieu-Thi Luong, Junichi Yamagishi
Most neural-network based speaker-adaptive acoustic models for speech synthesis can be categorized into either layer-based or input-code approaches.
no code implementations • WS 2016 • Hieu-Thi Luong, Hai-Quan Vu
In this paper we describe a non-expert setup for Vietnamese speech recognition system using Kaldi toolkit.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1