Search Results for author: Hieu-Thi Luong

Found 13 papers, 1 papers with code

Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection

1 code implementation25 Jun 2024 Duc-Tuan Truong, Ruijie Tao, Tuan Nguyen, Hieu-Thi Luong, Kong Aik Lee, Eng Siong Chng

Recent synthetic speech detectors leveraging the Transformer model have superior performance compared to the convolutional neural network counterparts.

Synthetic Speech Detection

Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance

no code implementations25 Jun 2021 Hieu-Thi Luong, Junichi Yamagishi

Generally speaking, the main objective when training a neural speech synthesis system is to synthesize natural and expressive speech from the output layer of the neural network without much attention given to the hidden layers.

Quantization Speech Synthesis +1

Latent linguistic embedding for cross-lingual text-to-speech and voice conversion

no code implementations8 Oct 2020 Hieu-Thi Luong, Junichi Yamagishi

As the recently proposed voice cloning system, NAUTILUS, is capable of cloning unseen voices using untranscribed speech, we investigate the feasibility of using it to develop a unified cross-lingual TTS/VC system.

Voice Cloning Voice Conversion

NAUTILUS: a Versatile Voice Cloning System

no code implementations22 May 2020 Hieu-Thi Luong, Junichi Yamagishi

By using a multi-speaker speech corpus to train all requisite encoders and decoders in the initial training stage, our system can clone unseen voices using untranscribed speech of target speakers on the basis of the backpropagation algorithm.

Speech Synthesis Voice Cloning +1

Bootstrapping non-parallel voice conversion from speaker-adaptive text-to-speech

no code implementations14 Sep 2019 Hieu-Thi Luong, Junichi Yamagishi

Voice conversion (VC) and text-to-speech (TTS) are two tasks that share a similar objective, generating speech with a target voice.

Voice Conversion

A Unified Speaker Adaptation Method for Speech Synthesis using Transcribed and Untranscribed Speech with Backpropagation

no code implementations18 Jun 2019 Hieu-Thi Luong, Junichi Yamagishi

In this study, we propose a novel speech synthesis model, which can be adapted to unseen speakers by fine-tuning part of or all of the network using either transcribed or untranscribed speech.

Decoder Speech Synthesis

Training Multi-Speaker Neural Text-to-Speech Systems using Speaker-Imbalanced Speech Corpora

no code implementations1 Apr 2019 Hieu-Thi Luong, Xin Wang, Junichi Yamagishi, Nobuyuki Nishizawa

When the available data of a target speaker is insufficient to train a high quality speaker-dependent neural text-to-speech (TTS) system, we can combine data from multiple speakers and train a multi-speaker TTS model instead.

Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects

no code implementations2 Aug 2018 Hieu-Thi Luong, Xin Wang, Junichi Yamagishi, Nobuyuki Nishizawa

We investigated the impact of noisy linguistic features on the performance of a Japanese speech synthesis system based on neural network that uses WaveNet vocoder.

Denoising Speech Synthesis

Wasserstein GAN and Waveform Loss-based Acoustic Model Training for Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder

no code implementations31 Jul 2018 Yi Zhao, Shinji Takaki, Hieu-Thi Luong, Junichi Yamagishi, Daisuke Saito, Nobuaki Minematsu

In order to reduce the mismatched characteristics between natural and generated acoustic features, we propose frameworks that incorporate either a conditional generative adversarial network (GAN) or its variant, Wasserstein GAN with gradient penalty (WGAN-GP), into multi-speaker speech synthesis that uses the WaveNet vocoder.

Generative Adversarial Network Speech Synthesis +1

Scaling and bias codes for modeling speaker-adaptive DNN-based speech synthesis systems

no code implementations31 Jul 2018 Hieu-Thi Luong, Junichi Yamagishi

Most neural-network based speaker-adaptive acoustic models for speech synthesis can be categorized into either layer-based or input-code approaches.

Speech Synthesis

Cannot find the paper you are looking for? You can Submit a new open access paper.