Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

dathudeptrai/TensorflowTTS • • 16 Dec 2017

This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text.

Ranked #2 on Speech Synthesis on North American English

Paper
Code

FastSpeech: Fast,Robustand Controllable Text-to-Speech

dathudeptrai/TensorflowTTS • • 22 May 2019

Compared with traditional concatenative and statistical parametric approaches, neural network based end-to-end models suffer from slow inference speed, and the synthesized speech is usually not robust (i. e., some words are skipped or repeated) and lack of controllability (voice speed or prosody control).

Text-To-Speech Synthesis

Paper
Code

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

dathudeptrai/TensorflowTTS • • ICLR 2021

In this paper, we propose FastSpeech 2, which addresses the issues in FastSpeech and better solves the one-to-many mapping problem in TTS by 1) directly training the model with ground-truth target instead of the simplified output from teacher, and 2) introducing more variation information of speech (e. g., pitch, energy and more accurate duration) as conditional inputs.

Ranked #6 on Text-To-Speech Synthesis on LJSpeech (using extra training data)

Knowledge Distillation Speech Synthesis +1

Paper
Code

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

dathudeptrai/TensorflowTTS • • NeurIPS 2019

In this paper, we show that it is possible to train GANs reliably to generate high quality coherent waveforms by introducing a set of architectural changes and simple training techniques.

Speech Synthesis Translation

Paper
Code

FastSpeech: Fast, Robust and Controllable Text to Speech

dathudeptrai/TensorflowTTS • • NeurIPS 2019

In this work, we propose a novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS.

Ranked #10 on Text-To-Speech Synthesis on LJSpeech (using extra training data)

Speech Synthesis Text-To-Speech Synthesis

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.

Search Results

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

FastSpeech: Fast,Robustand Controllable Text-to-Speech

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

FastSpeech: Fast, Robust and Controllable Text to Speech