Search Results for author: Minghui Fang

Found 6 papers, 4 papers with code

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

1 code implementation29 Aug 2024 Shengpeng Ji, Ziyue Jiang, Xize Cheng, Yifu Chen, Minghui Fang, Jialong Zuo, Qian Yang, RuiQi Li, Ziang Zhang, Xiaoda Yang, Rongjie Huang, Yidi Jiang, Qian Chen, Siqi Zheng, Wen Wang, Zhou Zhao

Despite the reduced number of tokens, WavTokenizer achieves state-of-the-art reconstruction quality with outstanding UTMOS scores and inherently contains richer semantic information.

Language Modelling

ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling

no code implementations25 Jun 2024 Minghui Fang, Shengpeng Ji, Jialong Zuo, Hai Huang, Yan Xia, Jieming Zhu, Xize Cheng, Xiaoda Yang, Wenrui Liu, Gang Wang, Zhenhua Dong, Zhou Zhao

Generative retrieval, which has demonstrated effectiveness in text-to-text retrieval, utilizes a sequence-to-sequence model to directly generate candidate identifiers based on natural language queries.

Cross-Modal Retrieval Natural Language Queries +2

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

1 code implementation3 Jun 2024 Shengpeng Ji, Jialong Zuo, Minghui Fang, Siqi Zheng, Qian Chen, Wen Wang, Ziyue Jiang, Hai Huang, Xize Cheng, Rongjie Huang, Zhou Zhao

In this paper, we present ControlSpeech, a text-to-speech (TTS) system capable of fully cloning the speaker's voice and enabling arbitrary control and adjustment of speaking style, merely based on a few seconds of audio prompt and a simple textual style description prompt.

Speech Synthesis Text to Speech

TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models

1 code implementation28 Aug 2023 Shengpeng Ji, Jialong Zuo, Minghui Fang, Ziyue Jiang, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao

The dataset comprises 236, 220 pairs of style prompt in natural text descriptions with five style factors and corresponding speech samples.

Language Modelling Text to Speech

Cannot find the paper you are looking for? You can Submit a new open access paper.