Search Results for author: Frank Soong

Found 7 papers, 3 papers with code

Ordinal Regression via Binary Preference vs Simple Regression: Statistical and Experimental Perspectives

no code implementations6 Jul 2022 Bin Su, Shaoguang Mao, Frank Soong, Zhiyong Wu

The ORARS addresses the MOS prediction problem by pairing a test sample with each of the pre-scored anchored reference samples.


NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

2 code implementations9 May 2022 Xu Tan, Jiawei Chen, Haohe Liu, Jian Cong, Chen Zhang, Yanqing Liu, Xi Wang, Yichong Leng, YuanHao Yi, Lei He, Frank Soong, Tao Qin, Sheng Zhao, Tie-Yan Liu

In this paper, we answer these questions by first defining the human-level quality based on the statistical significance of subjective measure and introducing appropriate guidelines to judge it, and then developing a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset.

Speech Synthesis Text-To-Speech Synthesis

An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings

no code implementations14 Oct 2021 Wenxuan Ye, Shaoguang Mao, Frank Soong, Wenshan Wu, Yan Xia, Jonathan Tien, Zhiyong Wu

These embeddings, when used as implicit phonetic supplementary information, can alleviate the data shortage of explicit phoneme annotations.

A Survey on Neural Speech Synthesis

1 code implementation29 Jun 2021 Xu Tan, Tao Qin, Frank Soong, Tie-Yan Liu

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural speech given text, is a hot research topic in speech, language, and machine learning communities and has broad applications in the industry.

Speech Synthesis

Improving pronunciation assessment via ordinal regression with anchored reference samples

no code implementations26 Oct 2020 Bin Su, Shaoguang Mao, Frank Soong, Yan Xia, Jonathan Tien, Zhiyong Wu

Traditional speech pronunciation assessment, based on the Goodness of Pronunciation (GOP) algorithm, has some weakness in assessing a speech utterance: 1) Phoneme GOP scores cannot be easily translated into a sentence score with a simple average for effective assessment; 2) The rank ordering information has not been well exploited in GOP scoring for delivering a robust assessment and correlate well with a human rater's evaluations.


Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network

1 code implementation31 Jan 2020 Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, Hong-Goo Kang

In this paper, we propose an improved LPCNet vocoder using a linear prediction (LP)-structured mixture density network (MDN).

Quantization Speech Synthesis

Cannot find the paper you are looking for? You can Submit a new open access paper.