Search Results for author: Zhizheng Wu

Found 9 papers, 1 papers with code

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

no code implementations5 Mar 2024 Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao

Specifically, 1) we design a neural codec with factorized vector quantization (FVQ) to disentangle speech waveform into subspaces of content, prosody, timbre, and acoustic details; 2) we propose a factorized diffusion model to generate attributes in each subspace following its corresponding prompt.

Quantization Speech Synthesis

CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing

no code implementations22 Jan 2024 Xianghu Yue, Xiaohai Tian, Lu Lu, Malu Zhang, Zhizheng Wu, Haizhou Li

To bridge the gap between modalities, CoAVT employs a query encoder, which contains a set of learnable query embeddings, and extracts the most informative audiovisual features of the corresponding text.

AudioCaps Audio-Visual Synchronization +4

Accented Text-to-Speech Synthesis with Limited Data

no code implementations8 May 2023 Xuehao Zhou, Mingyang Zhang, Yi Zhou, Zhizheng Wu, Haizhou Li

Both objective and subjective evaluation results show that the accented TTS front-end fine-tuned with a small accented phonetic lexicon (5k words) effectively handles the phonetic variation of accents, while the accented TTS acoustic model fine-tuned with a limited amount of accented speech data (approximately 3 minutes) effectively improves the prosodic rendering including pitch and duration.

Speech Synthesis Text-To-Speech Synthesis

TTS-Guided Training for Accent Conversion Without Parallel Data

no code implementations20 Dec 2022 Yi Zhou, Zhizheng Wu, Mingyang Zhang, Xiaohai Tian, Haizhou Li

Specifically, a text-to-speech (TTS) system is first pretrained with target-accented speech data.

Building a mixed-lingual neural TTS system with only monolingual data

no code implementations12 Apr 2019 Liumeng Xue, Wei Song, Guanghui Xu, Lei Xie, Zhizheng Wu

When deploying a Chinese neural text-to-speech (TTS) synthesis system, one of the challenges is to synthesize Chinese utterances with English phrases or words embedded.

Improving Trajectory Modelling for DNN-based Speech Synthesis by using Stacked Bottleneck Features and Minimum Generation Error Training

no code implementations22 Feb 2016 Zhizheng Wu, Simon King

We propose two novel techniques --- stacking bottleneck features and minimum generation error training criterion --- to improve the performance of deep neural network (DNN)-based speech synthesis.

Speech Synthesis

Spoofing detection under noisy conditions: a preliminary investigation and an initial database

no code implementations9 Feb 2016 Xiaohai Tian, Zhizheng Wu, Xiong Xiao, Eng Siong Chng, Haizhou Li

To simulate the real-life scenarios, we perform a preliminary investigation of spoofing detection under additive noisy conditions, and also describe an initial database for this task.

Speaker Verification

Investigating gated recurrent neural networks for speech synthesis

no code implementations11 Jan 2016 Zhizheng Wu, Simon King

Recently, recurrent neural networks (RNNs) as powerful sequence models have re-emerged as a potential acoustic model for statistical parametric speech synthesis (SPSS).

Speech Synthesis

Cannot find the paper you are looking for? You can Submit a new open access paper.