About

Converting written text in natural language to speech.

Benchmarks

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Subtasks

Datasets

Greatest papers with code

Efficient Neural Audio Synthesis

ICML 2018 CorentinJ/Real-Time-Voice-Cloning

The small number of weights in a Sparse WaveRNN makes it possible to sample high-fidelity audio on a mobile CPU in real time.

SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

Tacotron: Towards End-to-End Speech Synthesis

29 Mar 2017CorentinJ/Real-Time-Voice-Cloning

A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module.

SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

25 Oct 2019TensorSpeech/TensorflowTTS

We propose Parallel WaveGAN, a distillation-free, fast, and small-footprint waveform generation method using a generative adversarial network.

SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

FastSpeech: Fast,Robustand Controllable Text-to-Speech

22 May 2019TensorSpeech/TensorflowTTS

Compared with traditional concatenative and statistical parametric approaches, neural network based end-to-end models suffer from slow inference speed, and the synthesized speech is usually not robust (i. e., some words are skipped or repeated) and lack of controllability (voice speed or prosody control).

SPEECH QUALITY TEXT-TO-SPEECH SYNTHESIS

Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention

24 Oct 2017r9y9/deepvoice3_pytorch

This paper describes a novel text-to-speech (TTS) technique based on deep convolutional neural networks (CNN), without use of any recurrent units.

TEXT-TO-SPEECH SYNTHESIS

FastSpeech: Fast, Robust and Controllable Text to Speech

NeurIPS 2019 as-ideas/TransformerTTS

In this work, we propose a novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS.

SPEECH QUALITY SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

Neural Speech Synthesis with Transformer Network

19 Sep 2018as-ideas/TransformerTTS

Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed and achieve state-of-the-art performance, they still suffer from two problems: 1) low efficiency during training and inference; 2) hard to model long dependency using current recurrent neural networks (RNNs).

MACHINE TRANSLATION SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

ICLR 2021 NVIDIA/flowtron

In this paper we propose Flowtron: an autoregressive flow-based generative network for text-to-speech synthesis with control over speech variation and style transfer.

 Ranked #1 on Text-To-Speech Synthesis on LJSpeech (Pleasantness MOS metric)

SPEECH QUALITY SPEECH SYNTHESIS STYLE TRANSFER TEXT-TO-SPEECH SYNTHESIS

WaveGrad: Estimating Gradients for Waveform Generation

ICLR 2021 ivanvovk/WaveGrad

This paper introduces WaveGrad, a conditional model for waveform generation which estimates gradients of the data density.

SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS