Tacotron: Towards End-to-End Speech Synthesis

29 Mar 2017Yuxuan WangRJ Skerry-RyanDaisy StantonYonghui WuRon J. WeissNavdeep JaitlyZongheng YangYing XiaoZhifeng ChenSamy BengioQuoc LeYannis AgiomyrgiannakisRob ClarkRif A. Saurous

A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module. Building these components often requires extensive domain expertise and may contain brittle design choices... (read more)

PDF Abstract

Evaluation results from the paper

Task Dataset Model Metric name Metric value Global rank Compare
Speech Synthesis North American English Tacotron Mean Opinion Score 4.001 # 4