Text-To-Speech Synthesis

39 papers with code • 2 benchmarks • 6 datasets

Converting written text in natural language to speech.

Greatest papers with code

Efficient Neural Audio Synthesis

CorentinJ/Real-Time-Voice-Cloning ICML 2018

The small number of weights in a Sparse WaveRNN makes it possible to sample high-fidelity audio on a mobile CPU in real time.

Speech Synthesis Text-To-Speech Synthesis

Tacotron: Towards End-to-End Speech Synthesis

CorentinJ/Real-Time-Voice-Cloning 29 Mar 2017

A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module.

Speech Synthesis Text-To-Speech Synthesis

WaveGrad: Estimating Gradients for Waveform Generation

coqui-ai/TTS ICLR 2021

This paper introduces WaveGrad, a conditional model for waveform generation which estimates gradients of the data density.

Speech Synthesis Text-To-Speech Synthesis

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

coqui-ai/TTS 25 Oct 2019

We propose Parallel WaveGAN, a distillation-free, fast, and small-footprint waveform generation method using a generative adversarial network.

Speech Synthesis Text-To-Speech Synthesis

FastSpeech: Fast, Robust and Controllable Text to Speech

coqui-ai/TTS NeurIPS 2019

In this work, we propose a novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS.

Speech Quality Speech Synthesis +1

Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention

coqui-ai/TTS 24 Oct 2017

This paper describes a novel text-to-speech (TTS) technique based on deep convolutional neural networks (CNN), without use of any recurrent units.

Text-To-Speech Synthesis

FastSpeech: Fast,Robustand Controllable Text-to-Speech

TensorSpeech/TensorflowTTS 22 May 2019

Compared with traditional concatenative and statistical parametric approaches, neural network based end-to-end models suffer from slow inference speed, and the synthesized speech is usually not robust (i. e., some words are skipped or repeated) and lack of controllability (voice speed or prosody control).

Speech Quality Text-To-Speech Synthesis

Neural Speech Synthesis with Transformer Network

PaddlePaddle/PaddleSpeech 19 Sep 2018

Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed and achieve state-of-the-art performance, they still suffer from two problems: 1) low efficiency during training and inference; 2) hard to model long dependency using current recurrent neural networks (RNNs).

Machine Translation Speech Synthesis +1

Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

NVIDIA/flowtron ICLR 2021

In this paper we propose Flowtron: an autoregressive flow-based generative network for text-to-speech synthesis with control over speech variation and style transfer.

 Ranked #1 on Text-To-Speech Synthesis on LJSpeech (Pleasantness MOS metric)

Speech Quality Speech Synthesis +2