no code implementations • 6 Jun 2024 • Jinlong Xue, Yayue Deng, Yicheng Han, Yingming Gao, Ya Li
Recent advances in large language models (LLMs) and development of audio codecs greatly propel the zero-shot TTS.
no code implementations • 10 Mar 2024 • Yayue Deng, Mohan Xu, Yao Tang
The effectiveness of central bank communication is a crucial aspect of monetary policy transmission.
1 code implementation • 2 Jan 2024 • Jinlong Xue, Yayue Deng, Yingming Gao, Ya Li
Drawing inspiration from state-of-the-art Text-to-Image (T2I) diffusion models, we introduce Auffusion, a TTA system adapting T2I model frameworks to TTA task, by effectively leveraging their inherent generative strengths and precise cross-modal alignment.
Ranked #6 on Audio Generation on AudioCaps
1 code implementation • 27 Dec 2023 • Qifei Li, Yingming Gao, Cong Wang, Yayue Deng, Jinlong Xue, Yichen Han, Ya Li
To address this problem, we propose a frame-level emotional state alignment method for SER.
no code implementations • 16 Dec 2023 • Yayue Deng, Jinlong Xue, Yukang Jia, Qifei Li, Yichen Han, Fengping Wang, Yingming Gao, Dengfeng Ke, Ya Li
In this paper, we introduce a contrastive learning-based CSS framework, CONCSS.
no code implementations • 5 Jun 2023 • Dengfeng Ke, Yayue Deng, Yukang Jia, Jinlong Xue, Qi Luo, Ya Li, Jianqing Sun, Jiaen Liang, Binghuai Lin
Regressive Text-to-Speech (TTS) system utilizes attention mechanism to generate alignment between text and acoustic feature sequence.
no code implementations • 3 May 2023 • Jinlong Xue, Yayue Deng, Fengping Wang, Ya Li, Yingming Gao, JianHua Tao, Jianqing Sun, Jiaen Liang
However, it is still a challenge to comprehensively model the conversation, and a majority of conversational TTS systems only focus on extracting global information and omit local prosody features, which contain important fine-grained information like keywords and emphasis.
1 code implementation • 20 Mar 2022 • Jinlong Xue, Yayue Deng, Yichen Han, Ya Li, Jianqing Sun, Jiaen Liang
In recent years, neural network based methods for multi-speaker text-to-speech synthesis (TTS) have made significant progress.