WaveTTS

Introduced by Liu et al. in WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss

WaveTTS is a Tacotron-based text-to-speech architecture that has two loss functions: 1) time-domain loss, denoted as the waveform loss, that measures the distortion between the natural and generated waveform; and 2) frequency-domain loss, that measures the Mel-scale acoustic feature loss between the natural and generated acoustic features.

The motivation arises from Tacotron 2. Here its feature prediction network is trained independently of the WaveNet vocoder. At run-time, the feature prediction network and WaveNet vocoder are artificially joined together. As a result, the framework suffers from the mismatch between frequency-domain acoustic features and time-domain waveform. To overcome such mismatch, WaveTTS uses a joint time-frequency domain loss for TTS that effectively improves the synthesized voice quality.

Source: WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss

Read Paper

Papers

Paper	Code	Results	Date	Stars

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
BiLSTM	Deep Tabular Learning
Convolution	Convolutions
Griffin-Lim Algorithm	Phase Reconstruction
Linear Layer	Feedforward Networks
Location Sensitive Attention	Attention Mechanisms
ReLU	Activation Functions
WaveNet	Generative Audio Models

Categories

Add Remove

Text-to-Speech Models

Sequence To Sequence Models

WaveTTS

Papers

Usage Over Time

Components

Categories Edit Add Remove

Categories

Add Remove