Search Results for author: Na Hu

Found 5 papers, 3 papers with code

Controllable Context-aware Conversational Speech Synthesis

no code implementations21 Jun 2021 Jian Cong, Shan Yang, Na Hu, Guangzhi Li, Lei Xie, Dan Su

Specifically, we use explicit labels to represent two typical spontaneous behaviors filled-pause and prolongation in the acoustic model and develop a neural network based predictor to predict the occurrences of the two behaviors from text.

Speech Synthesis

VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention

no code implementations12 Feb 2021 Peng Liu, Yuewen Cao, Songxiang Liu, Na Hu, Guangzhi Li, Chao Weng, Dan Su

This paper proposes VARA-TTS, a non-autoregressive (non-AR) text-to-speech (TTS) model using a very deep Variational Autoencoder (VDVAE) with Residual Attention mechanism, which refines the textual-to-acoustic alignment layer-wisely.

Speech Synthesis Text-To-Speech Synthesis

Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training

1 code implementation3 Dec 2020 Haohan Guo, Heng Lu, Na Hu, Chunlei Zhang, Shan Yang, Lei Xie, Dan Su, Dong Yu

In order to make timbre conversion more stable and controllable, speaker embedding is further decomposed to the weighted sum of a group of trainable vectors representing different timbre clusters.

Audio Generation Disentanglement +1

FastSVC: Fast Cross-Domain Singing Voice Conversion with Feature-wise Linear Modulation

2 code implementations11 Nov 2020 Songxiang Liu, Yuewen Cao, Na Hu, Dan Su, Helen Meng

This paper presents FastSVC, a light-weight cross-domain singing voice conversion (SVC) system, which can achieve high conversion performance, with inference speed 4x faster than real-time on CPUs.

Voice Conversion

DurIAN: Duration Informed Attention Network For Multimodal Synthesis

4 code implementations4 Sep 2019 Chengzhu Yu, Heng Lu, Na Hu, Meng Yu, Chao Weng, Kun Xu, Peng Liu, Deyi Tuo, Shiyin Kang, Guangzhi Lei, Dan Su, Dong Yu

In this paper, we present a generic and robust multimodal synthesis system that produces highly natural speech and facial expression simultaneously.

Speech Synthesis

Cannot find the paper you are looking for? You can Submit a new open access paper.