WaveNet

Introduced by Oord et al. in WaveNet: A Generative Model for Raw Audio

WaveNet is an audio generative model based on the PixelCNN architecture. In order to deal with long-range temporal dependencies needed for raw audio generation, architectures are developed based on dilated causal convolutions, which exhibit very large receptive fields.

The joint probability of a waveform $\vec{x} = { x_1, \dots, x_T }$ is factorised as a product of conditional probabilities as follows:

$$p\left(\vec{x}\right) = \prod_{t=1}^{T} p\left(x_t \mid x_1, \dots ,x_{t-1}\right)$$

Each audio sample $x_t$ is therefore conditioned on the samples at all previous timesteps.

Source: WaveNet: A Generative Model for Raw Audio

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Speech Synthesis	53	22.84%
Decoder	18	7.76%
Text-To-Speech Synthesis	16	6.90%
Voice Conversion	12	5.17%
Audio Generation	8	3.45%
Speech Enhancement	6	2.59%
Time Series Analysis	5	2.16%
Speech Recognition	5	2.16%
Translation	5	2.16%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Dilated Causal Convolution	Temporal Convolutions
Mixture of Logistic Distributions	Output Functions

Categories

Add Remove

Generative Audio Models