Search Results for author: Zhizheng Wu

Found 9 papers, 1 papers with code

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

no code implementations • 5 Mar 2024 • Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao

Specifically, 1) we design a neural codec with factorized vector quantization (FVQ) to disentangle speech waveform into subspaces of content, prosody, timbre, and acoustic details; 2) we propose a factorized diffusion model to generate attributes in each subspace following its corresponding prompt.

Quantization Speech Synthesis

Paper
Add Code

CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing

no code implementations • 22 Jan 2024 • Xianghu Yue, Xiaohai Tian, Lu Lu, Malu Zhang, Zhizheng Wu, Haizhou Li

To bridge the gap between modalities, CoAVT employs a query encoder, which contains a set of learnable query embeddings, and extracts the most informative audiovisual features of the corresponding text.

AudioCaps Audio-Visual Synchronization +4

Paper
Add Code

Accented Text-to-Speech Synthesis with Limited Data

no code implementations • 8 May 2023 • Xuehao Zhou, Mingyang Zhang, Yi Zhou, Zhizheng Wu, Haizhou Li

Both objective and subjective evaluation results show that the accented TTS front-end fine-tuned with a small accented phonetic lexicon (5k words) effectively handles the phonetic variation of accents, while the accented TTS acoustic model fine-tuned with a limited amount of accented speech data (approximately 3 minutes) effectively improves the prosodic rendering including pitch and duration.

Speech Synthesis Text-To-Speech Synthesis

Paper
Add Code

TTS-Guided Training for Accent Conversion Without Parallel Data

no code implementations • 20 Dec 2022 • Yi Zhou, Zhizheng Wu, Mingyang Zhang, Xiaohai Tian, Haizhou Li

Specifically, a text-to-speech (TTS) system is first pretrained with target-accented speech data.

Paper
Add Code

Inconsistency Ranking-based Noisy Label Detection for High-quality Data

1 code implementation • 1 Dec 2022 • Ruibin Yuan, Hanzhi Yin, Yi Wang, Yifan He, Yushi Ye, Lei Zhang, Zhizheng Wu

We apply this technique to the automatic speaker verification (ASV) task as a proof of concept.

Metric Learning Speaker Recognition +1

Paper
Code

Building a mixed-lingual neural TTS system with only monolingual data

no code implementations • 12 Apr 2019 • Liumeng Xue, Wei Song, Guanghui Xu, Lei Xie, Zhizheng Wu

When deploying a Chinese neural text-to-speech (TTS) synthesis system, one of the challenges is to synthesize Chinese utterances with English phrases or words embedded.

Paper
Add Code

Improving Trajectory Modelling for DNN-based Speech Synthesis by using Stacked Bottleneck Features and Minimum Generation Error Training

no code implementations • 22 Feb 2016 • Zhizheng Wu, Simon King

We propose two novel techniques --- stacking bottleneck features and minimum generation error training criterion --- to improve the performance of deep neural network (DNN)-based speech synthesis.

Speech Synthesis

Paper
Add Code

Spoofing detection under noisy conditions: a preliminary investigation and an initial database

no code implementations • 9 Feb 2016 • Xiaohai Tian, Zhizheng Wu, Xiong Xiao, Eng Siong Chng, Haizhou Li

To simulate the real-life scenarios, we perform a preliminary investigation of spoofing detection under additive noisy conditions, and also describe an initial database for this task.

Speaker Verification

Paper
Add Code

Investigating gated recurrent neural networks for speech synthesis

no code implementations • 11 Jan 2016 • Zhizheng Wu, Simon King

Recently, recurrent neural networks (RNNs) as powerful sequence models have re-emerged as a potential acoustic model for statistical parametric speech synthesis (SPSS).

Speech Synthesis

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.