Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

12 May 2020Rafael ValleKevin ShihRyan PrengerBryan Catanzaro

In this paper we propose Flowtron: an autoregressive flow-based generative network for text-to-speech synthesis with control over speech variation and style transfer. Flowtron borrows insights from IAF and revamps Tacotron in order to provide high-quality and expressive mel-spectrogram synthesis... (read more)

PDF Abstract

Results from the Paper


 SOTA for Text-To-Speech Synthesis on LJSpeech (Pleasantness MOS metric )

     Get a GitHub badge
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT LEADERBOARD
Text-To-Speech Synthesis LJSpeech Flowtron Pleasantness MOS 3.665 # 1
Text-To-Speech Synthesis LJSpeech Tacotron 2 Pleasantness MOS 3.521 # 2

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet