FastSpeech: Fast, Robust and Controllable Text to Speech

NeurIPS 2019 Yi RenYangjun RuanXu TanTao QinSheng ZhaoZhou ZhaoTie-Yan Liu

Neural network based end-to-end text to speech (TTS) has significantly improved the quality of synthesized speech. Prominent methods (e.g., Tacotron 2) usually first generate mel-spectrogram from text, and then synthesize speech from the mel-spectrogram using vocoder such as WaveNet... (read more)

PDF Abstract NeurIPS 2019 PDF NeurIPS 2019 Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Text-To-Speech Synthesis LJSpeech Merlin Audio Quality MOS 2.4 # 3
Text-To-Speech Synthesis LJSpeech FastSpeech (Mel + WaveGlow) Audio Quality MOS 3.84 # 2

Methods used in the Paper