no code implementations • 6 Jul 2022 • Bin Su, Shaoguang Mao, Frank Soong, Zhiyong Wu
The ORARS addresses the MOS prediction problem by pairing a test sample with each of the pre-scored anchored reference samples.
2 code implementations • 9 May 2022 • Xu Tan, Jiawei Chen, Haohe Liu, Jian Cong, Chen Zhang, Yanqing Liu, Xi Wang, Yichong Leng, YuanHao Yi, Lei He, Frank Soong, Tao Qin, Sheng Zhao, Tie-Yan Liu
In this paper, we answer these questions by first defining the human-level quality based on the statistical significance of subjective measure and introducing appropriate guidelines to judge it, and then developing a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset.
Ranked #1 on Text-To-Speech Synthesis on LJSpeech
no code implementations • 14 Oct 2021 • Wenxuan Ye, Shaoguang Mao, Frank Soong, Wenshan Wu, Yan Xia, Jonathan Tien, Zhiyong Wu
These embeddings, when used as implicit phonetic supplementary information, can alleviate the data shortage of explicit phoneme annotations.
1 code implementation • 29 Jun 2021 • Xu Tan, Tao Qin, Frank Soong, Tie-Yan Liu
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural speech given text, is a hot research topic in speech, language, and machine learning communities and has broad applications in the industry.
no code implementations • 26 Oct 2020 • Bin Su, Shaoguang Mao, Frank Soong, Yan Xia, Jonathan Tien, Zhiyong Wu
Traditional speech pronunciation assessment, based on the Goodness of Pronunciation (GOP) algorithm, has some weakness in assessing a speech utterance: 1) Phoneme GOP scores cannot be easily translated into a sentence score with a simple average for effective assessment; 2) The rank ordering information has not been well exploited in GOP scoring for delivering a robust assessment and correlate well with a human rater's evaluations.
1 code implementation • 31 Jan 2020 • Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, Hong-Goo Kang
In this paper, we propose an improved LPCNet vocoder using a linear prediction (LP)-structured mixture density network (MDN).
no code implementations • 13 Dec 2018 • Yan Deng, Lei He, Frank Soong
Neural TTS has shown it can generate high quality synthesized speech.