Browse > Speech > Speech Synthesis

Speech Synthesis

22 papers with code · Speech

Speech synthesis is the task of generating speech from text.

Please note that the state-of-the-art tables here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

State-of-the-art leaderboards

Greatest papers with code

WaveNet: A Generative Model for Raw Audio

12 Sep 2016buriburisuri/speech-to-text-wavenet

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.

AUDIO GENERATION SPEECH SYNTHESIS

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

NeurIPS 2018 tensorflow/lingvo

We describe a neural network-based system for text-to-speech (TTS) synthesis that is able to generate speech audio in the voice of many different speakers, including those unseen during training.

SPEAKER VERIFICATION SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS TRANSFER LEARNING

Tacotron: Towards End-to-End Speech Synthesis

29 Mar 2017keithito/tacotron

A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module.

SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

WaveGlow: A Flow-based Generative Network for Speech Synthesis

31 Oct 2018NVIDIA/waveglow

In this paper we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms.

SPEECH SYNTHESIS

Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

ICLR 2018 r9y9/deepvoice3_pytorch

We present Deep Voice 3, a fully-convolutional attention-based neural text-to-speech (TTS) system.

SPEECH SYNTHESIS

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

16 Dec 2017Rayhane-mamah/Tacotron-2

This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text.

SPEECH SYNTHESIS

Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq

25 May 2018NVIDIA/OpenSeq2Seq

We present OpenSeq2Seq - a TensorFlow-based toolkit for training sequence-to-sequence models that features distributed and mixed-precision training.

MACHINE TRANSLATION SPEECH RECOGNITION SPEECH SYNTHESIS

Deep Voice: Real-time Neural Text-to-Speech

ICML 2017 NVIDIA/nv-wavenet

We present Deep Voice, a production-quality text-to-speech system constructed entirely from deep neural networks.

BOUNDARY DETECTION SPEECH SYNTHESIS

A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet

28 Mar 2019mozilla/LPCNet

We demonstrate that LPCNet operating at 1. 6 kb/s achieves significantly higher quality than MELP and that uncompressed LPCNet can exceed the quality of a waveform codec operating at low bitrate.

SPEECH SYNTHESIS

LPCNet: Improving Neural Speech Synthesis Through Linear Prediction

28 Oct 2018mozilla/LPCNet

We demonstrate that LPCNet can achieve significantly higher quality than WaveRNN for the same network size and that high quality LPCNet speech synthesis is achievable with a complexity under 3 GFLOPS.

SPEECH SYNTHESIS