Search Results for author: Yujia Xiao

Found 5 papers, 3 papers with code

PodAgent: A Comprehensive Framework for Podcast Generation

1 code implementation1 Mar 2025 Yujia Xiao, Lei He, Haohan Guo, Fenglong Xie, Tan Lee

The key challenges lie in in-depth content generation, appropriate and expressive voice production.

Audio Generation Speech Synthesis

Audio-FLAN: A Preliminary Release

1 code implementation23 Feb 2025 Liumeng Xue, Ziya Zhou, Jiahao Pan, Zixuan Li, Shuai Fan, Yinghao Ma, Sitong Cheng, Dongchao Yang, Haohan Guo, Yujia Xiao, Xinsheng Wang, Zixuan Shen, Chuanbo Zhu, Xinshen Zhang, Tianchi Liu, Ruibin Yuan, Zeyue Tian, Haohe Liu, Emmanouil Benetos, Ge Zhang, Yike Guo, Wei Xue

Audio-FLAN lays the foundation for unified audio-language models that can seamlessly handle both understanding (e. g., transcription, comprehension) and generation (e. g., speech, music, sound) tasks across a wide range of audio domains in a zero-shot manner.

Zero-Shot Learning

ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training

no code implementations8 Jan 2025 Xinfa Zhu, Lei He, Yujia Xiao, Xi Wang, Xu Tan, Sheng Zhao, Lei Xie

Style voice conversion aims to transform the speaking style of source speech into a desired style while keeping the original speaker's identity.

In-Context Learning Voice Conversion

QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning

1 code implementation31 Aug 2023 Haohan Guo, Fenglong Xie, Jiawen Kang, Yujia Xiao, Xixin Wu, Helen Meng

This paper proposes a novel semi-supervised TTS framework, QS-TTS, to improve TTS quality with lower supervised data requirements via Vector-Quantized Self-Supervised Speech Representation Learning (VQ-S3RL) utilizing more unlabeled speech audio.

Representation Learning Speech Representation Learning +4

ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading

no code implementations3 Jul 2023 Yujia Xiao, Shaofei Zhang, Xi Wang, Xu Tan, Lei He, Sheng Zhao, Frank K. Soong, Tan Lee

Experiments show that ContextSpeech significantly improves the voice quality and prosody expressiveness in paragraph reading with competitive model efficiency.

Form Sentence +1

Cannot find the paper you are looking for? You can Submit a new open access paper.