Search Results for author: Zhifang Guo

Found 5 papers, 1 papers with code

PromptTTS 2: Describing and Generating Voices with Text Prompt

no code implementations5 Sep 2023 Yichong Leng, Zhifang Guo, Kai Shen, Xu Tan, Zeqian Ju, Yanqing Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He, Xiang-Yang Li, Sheng Zhao, Tao Qin, Jiang Bian

TTS approaches based on the text prompt face two main challenges: 1) the one-to-many problem, where not all details about voice variability can be described in the text prompt, and 2) the limited availability of text prompt datasets, where vendors and large cost of data labeling are required to write text prompts for speech.

Language Modelling Large Language Model

Audio Generation with Multiple Conditional Diffusion Model

no code implementations23 Aug 2023 Zhifang Guo, Jianguo Mao, Rui Tao, Long Yan, Kazushige Ouchi, Hong Liu, Xiangdong Wang

To address this issue, we propose a novel model that enhances the controllability of existing pre-trained text-to-audio models by incorporating additional conditions including content (timestamp) and style (pitch contour and energy contour) as supplements to the text.

Audio Generation Language Modelling +1

Furnishing Sound Event Detection with Language Model Abilities

no code implementations22 Aug 2023 Hualei Wang, Jianguo Mao, Zhifang Guo, Jiarui Wan, Hong Liu, Xiangdong Wang

Recently, the ability of language models (LMs) has attracted increasing attention in visual cross-modality.

Decoder Event Detection +2

PromptTTS: Controllable Text-to-Speech with Text Descriptions

no code implementations22 Nov 2022 Zhifang Guo, Yichong Leng, Yihan Wu, Sheng Zhao, Xu Tan

Thus, we develop a text-to-speech (TTS) system (dubbed as PromptTTS) that takes a prompt with both style and content descriptions as input to synthesize the corresponding speech.

Decoder Speech Synthesis

Cannot find the paper you are looking for? You can Submit a new open access paper.