TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Speech Synthesis	North American English	Tacotron 2	Mean Opinion Score	4.526	# 1
Speech Synthesis	North American English	WaveNet (Linguistic)	Mean Opinion Score	4.341	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/natural-tts-synthesis-by-conditioning-wavenet/speech-synthesis-on-north-american-english)](https://paperswithcode.com/sota/speech-synthesis-on-north-american-english?p=natural-tts-synthesis-by-conditioning-wavenet)`

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

16 Dec 2017 · Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu ·

This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize timedomain waveforms from those spectrograms. Our model achieves a mean opinion score (MOS) of $4.53$ comparable to a MOS of $4.58$ for professionally recorded speech. To validate our design choices, we present ablation studies of key components of our system and evaluate the impact of using mel spectrograms as the input to WaveNet instead of linguistic, duration, and $F_0$ features. We further demonstrate that using a compact acoustic intermediate representation enables significant simplification of the WaveNet architecture.

PDF Abstract

Code

Add Remove Mark official

coqui-ai/TTS

↳ Quickstart in

Spaces

29,183

PaddlePaddle/PaddleSpeech

10,131

NVIDIA/tacotron2

4,892

TensorSpeech/TensorflowTTS

↳ Quickstart in

Colab

Spaces

3,698

Rayhane-mamah/Tacotron-2

2,233

See all 30 implementations

Tasks

Add Remove

Speech Synthesis

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Edit

Ranked #2 on Speech Synthesis on North American English

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Speech Synthesis	North American English	Tacotron 2	Mean Opinion Score	4.526	# 1		Compare
Speech Synthesis	North American English	WaveNet (Linguistic)	Mean Opinion Score	4.341	# 2		Compare

Methods

Add Remove

Batch Normalization • BiLSTM • Convolution • Dilated Causal Convolution • Dropout • Exponential Decay • Linear Layer • Location Sensitive Attention • LSTM • Max Pooling • Mixture of Logistic Distributions • ReLU • Residual Connection • Tacotron 2 • Tanh Activation • WaveNet • Weight Decay • Zoneout

Edit Social Preview

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove