ParaNet

Introduced by Peng et al. in Non-Autoregressive Neural Text-to-Speech

ParaNet is a non-autoregressive attention-based architecture for text-to-speech, which is fully convolutional and converts text to mel spectrogram. ParaNet distills the attention from the autoregressive text-to-spectrogram model, and iteratively refines the alignment between text and spectrogram in a layer-by-layer manner. The architecture is otherwise similar to Deep Voice 3 except these changes to the decoder; whereas the decoder of DV3 has multiple attention-based layers, where each layer consists of a causal convolution block followed by an attention block, ParaNet has a single attention block in the encoder.

Source: Non-Autoregressive Neural Text-to-Speech

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
GPR	1	33.33%
point cloud upsampling	1	33.33%
Text-To-Speech Synthesis	1	33.33%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Dense Connections	Feedforward Networks
Leaky ReLU	Activation Functions
ParaNet Convolution Block	Audio Model Blocks
ReLU	Activation Functions
Softsign Activation	Activation Functions
Weight Normalization	Normalization

Categories

Add Remove

Text-to-Speech Models

Sequence To Sequence Models