no code implementations • 19 Dec 2023 • Xueyuan Chen, Xi Wang, Shaofei Zhang, Lei He, Zhiyong Wu, Xixin Wu, Helen Meng
Both objective and subjective evaluations demonstrate that our proposed method can effectively improve the naturalness and expressiveness of the synthesized speech in audiobook synthesis especially for the role and out-of-domain scenarios.
no code implementations • 7 Sep 2023 • Brendan Walsh, Mark Hamilton, Greg Newby, Xi Wang, Serena Ruan, Sheng Zhao, Lei He, Shaofei Zhang, Eric Dettinger, William T. Freeman, Markus Weimer
In this work, we present a system that can automatically generate high-quality audiobooks from online e-books.
no code implementations • 6 Sep 2023 • Zhihang Xu, Shaofei Zhang, Xi Wang, Jiajun Zhang, Wenning Wei, Lei He, Sheng Zhao
In this paper, we present MuLanTTS, the Microsoft end-to-end neural text-to-speech (TTS) system designed for the Blizzard Challenge 2023.
no code implementations • 3 Jul 2023 • Yujia Xiao, Shaofei Zhang, Xi Wang, Xu Tan, Lei He, Sheng Zhao, Frank K. Soong, Tan Lee
Experiments show that ContextSpeech significantly improves the voice quality and prosody expressiveness in paragraph reading with competitive model efficiency.
no code implementations • 14 Sep 2022 • Liumeng Xue, Frank K. Soong, Shaofei Zhang, Lei Xie
To alleviate the difficulty in training, we propose to model linguistic and prosodic information by considering cross-sentence, embedded structure in training.
no code implementations • 25 Jun 2022 • Yihan Wu, Xi Wang, Shaofei Zhang, Lei He, Ruihua Song, Jian-Yun Nie
In this paper, we propose a novel framework for learning style representation from abundant plain text in a self-supervised manner.
no code implementations • MediaEval 2015 Workshop 2015 • Jingyong Hou, Van Tung Pham, Cheung-Chi Leung, Lei Wang, HaiHua Xu, Hang Lv, Lei Xie, Zhonghua Fu, Chongjia Ni, Xiong Xiao, Hongjie Chen, Shaofei Zhang, Sining Sun, Yougen Yuan, Pengcheng Li, Tin Lay Nwe, Sunil Sivadas, Bin Ma, Eng Siong Chng, Haizhou Li
This paper describes the system developed by the NNI team for the Query-by-Example Search on Speech Task (QUESST) in the MediaEval 2015 evaluation.
Ranked #9 on Keyword Spotting on QUESST