CBHG

Introduced by Wang et al. in Tacotron: Towards End-to-End Speech Synthesis

CBHG is a building block used in the Tacotron text-to-speech model. It consists of a bank of 1-D convolutional filters, followed by highway networks and a bidirectional gated recurrent unit (BiGRU).

The module is used to extract representations from sequences. The input sequence is first convolved with $K$ sets of 1-D convolutional filters, where the $k$-th set contains $C_{k}$ filters of width $k$ (i.e. $k = 1, 2, \dots , K$). These filters explicitly model local and contextual information (akin to modeling unigrams, bigrams, up to K-grams). The convolution outputs are stacked together and further max pooled along time to increase local invariances. A stride of 1 is used to preserve the original time resolution. The processed sequence is further passed to a few fixed-width 1-D convolutions, whose outputs are added with the original input sequence via residual connections. Batch normalization is used for all convolutional layers. The convolution outputs are fed into a multi-layer highway network to extract high-level features. Finally, a bidirectional GRU RNN is stacked on top to extract sequential features from both forward and backward context.

Source: Tacotron: Towards End-to-End Speech Synthesis

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Speech Synthesis	42	42.42%
Text-To-Speech Synthesis	15	15.15%
Sentence	6	6.06%
Voice Cloning	5	5.05%
Voice Conversion	4	4.04%
Speech Recognition	4	4.04%
Expressive Speech Synthesis	3	3.03%
Self-Supervised Learning	2	2.02%
Speaker Verification	2	2.02%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Batch Normalization	Normalization
BiGRU	Bidirectional Recurrent Neural Networks
Convolution	Convolutions
Highway Network	Feedforward Networks
Max Pooling	Pooling Operations
ReLU	Activation Functions
Residual Connection	Skip Connections

Categories

Add Remove

Speech Synthesis Blocks

Sequential Blocks

Skip Connection Blocks