VQ-VAE

Introduced by Oord et al. in Neural Discrete Representation Learning

VQ-VAE is a type of variational autoencoder that uses vector quantisation to obtain a discrete latent representation. It differs from VAEs in two key ways: the encoder network outputs discrete, rather than continuous, codes; and the prior is learnt rather than static. In order to learn a discrete latent representation, ideas from vector quantisation (VQ) are incorporated. Using the VQ method allows the model to circumvent issues of posterior collapse - where the latents are ignored when they are paired with a powerful autoregressive decoder - typically observed in the VAE framework. Pairing these representations with an autoregressive prior, the model can generate high quality images, videos, and speech as well as doing high quality speaker conversion and unsupervised learning of phonemes.

Source: Neural Discrete Representation Learning

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Quantization	22	9.36%
Image Generation	19	8.09%
Speech Synthesis	9	3.83%
Language Modelling	8	3.40%
Motion Synthesis	7	2.98%
Image Reconstruction	7	2.98%
Denoising	5	2.13%
Text-to-Image Generation	5	2.13%
Music Generation	5	2.13%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Generative Models

Likelihood-Based Generative Models