Browse > Speech > Speech Synthesis

Speech Synthesis

27 papers with code · Speech

Speech synthesis is the task of generating speech from text.

Please note that the state-of-the-art tables here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

State-of-the-art leaderboards

Latest papers with code

Deep Residual Neural Networks for Audio Spoofing Detection

30 Jun 2019nesl/asvspoof2019

Additionally, replay attacks where the attacker uses a speaker to replay a previously recorded genuine human speech are also possible.

SPEAKER VERIFICATION SPEECH SYNTHESIS VOICE CONVERSION

2
30 Jun 2019

Using generative modelling to produce varied intonation for speech synthesis

10 Jun 2019ZackHodari/average_prosody

A generative model that can synthesise multiple prosodies will, by design, not model average prosody.

SPEECH SYNTHESIS

12
10 Jun 2019

MelNet: A Generative Model for Audio in the Frequency Domain

4 Jun 2019fatchord/MelNet

Capturing high-level structure in audio waveforms is challenging because a single second of audio spans tens of thousands of timesteps.

AUDIO GENERATION MUSIC GENERATION SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

156
04 Jun 2019

Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems

21 May 2019sewplay/demos

In this paper, we propose a high-quality generative text-to-speech (TTS) system using an effective spectrum and excitation estimation method.

SPEECH SYNTHESIS

0
21 May 2019

A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet

28 Mar 2019mozilla/LPCNet

We demonstrate that LPCNet operating at 1. 6 kb/s achieves significantly higher quality than MELP and that uncompressed LPCNet can exceed the quality of a waveform codec operating at low bitrate.

SPEECH SYNTHESIS

465
28 Mar 2019

Learning latent representations for style control and transfer in end-to-end speech synthesis

11 Dec 2018yanggeng1995/vae_tacotron

In this paper, we introduce the Variational Autoencoder (VAE) to an end-to-end speech synthesis model, to learn the latent representation of speaking styles in an unsupervised manner.

SPEECH SYNTHESIS STYLE TRANSFER

23
11 Dec 2018

Learning pronunciation from a foreign language in speech synthesis networks

23 Nov 2018Kyubyong/g2p

First, we train the speech synthesis network bilingually in English and Korean and analyze how the network learns the relations of phoneme pronunciation between the languages.

SPEECH SYNTHESIS

138
23 Nov 2018

WaveGlow: A Flow-based Generative Network for Speech Synthesis

31 Oct 2018NVIDIA/waveglow

In this paper we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms.

SPEECH SYNTHESIS

1,210
31 Oct 2018

Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language

29 Oct 2018nii-yamagishilab/self-attention-tacotron

Towards end-to-end Japanese speech synthesis, we extend Tacotron to systems with self-attention to capture long-term dependencies related to pitch accents and compare their audio quality with classical pipeline systems under various conditions to show their pros and cons.

SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

78
29 Oct 2018

LPCNet: Improving Neural Speech Synthesis Through Linear Prediction

28 Oct 2018mozilla/LPCNet

We demonstrate that LPCNet can achieve significantly higher quality than WaveRNN for the same network size and that high quality LPCNet speech synthesis is achievable with a complexity under 3 GFLOPS.

SPEECH SYNTHESIS

465
28 Oct 2018