Browse > Speech > Speech Synthesis

Speech Synthesis

25 papers with code · Speech

Speech synthesis is the task of generating speech from text.

Please note that the state-of-the-art tables here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

State-of-the-art leaderboards

Greatest papers with code

WaveNet: A Generative Model for Raw Audio

12 Sep 2016buriburisuri/speech-to-text-wavenet

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.

AUDIO GENERATION SPEECH SYNTHESIS

Tacotron: Towards End-to-End Speech Synthesis

29 Mar 2017keithito/tacotron

A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module.

SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

WaveGlow: A Flow-based Generative Network for Speech Synthesis

31 Oct 2018NVIDIA/waveglow

In this paper we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms.

SPEECH SYNTHESIS

Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

ICLR 2018 r9y9/deepvoice3_pytorch

We present Deep Voice 3, a fully-convolutional attention-based neural text-to-speech (TTS) system.

SPEECH SYNTHESIS

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

16 Dec 2017Rayhane-mamah/Tacotron-2

This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text.

SPEECH SYNTHESIS

Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq

25 May 2018NVIDIA/OpenSeq2Seq

We present OpenSeq2Seq - a TensorFlow-based toolkit for training sequence-to-sequence models that features distributed and mixed-precision training.

MACHINE TRANSLATION SPEECH RECOGNITION SPEECH SYNTHESIS

Deep Voice: Real-time Neural Text-to-Speech

ICML 2017 NVIDIA/nv-wavenet

We present Deep Voice, a production-quality text-to-speech system constructed entirely from deep neural networks.

BOUNDARY DETECTION FEATURE ENGINEERING SPEECH SYNTHESIS

A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet

28 Mar 2019mozilla/LPCNet

We demonstrate that LPCNet operating at 1. 6 kb/s achieves significantly higher quality than MELP and that uncompressed LPCNet can exceed the quality of a waveform codec operating at low bitrate.

SPEECH SYNTHESIS

LPCNet: Improving Neural Speech Synthesis Through Linear Prediction

28 Oct 2018mozilla/LPCNet

We demonstrate that LPCNet can achieve significantly higher quality than WaveRNN for the same network size and that high quality LPCNet speech synthesis is achievable with a complexity under 3 GFLOPS.

SPEECH SYNTHESIS

Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks

23 Sep 2017r9y9/gantts

In the proposed framework incorporating the GANs, the discriminator is trained to distinguish natural and generated speech parameters, while the acoustic models are trained to minimize the weighted sum of the conventional minimum generation loss and an adversarial loss for deceiving the discriminator.

SPEECH SYNTHESIS