TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Generation	BAIR Robot Pushing	VideoGPT	FVD score	103.3	# 9
Video Generation	BAIR Robot Pushing	VideoGPT	Cond	1	# 1
Video Generation	BAIR Robot Pushing	VideoGPT	Pred	15	# 8
Video Generation	BAIR Robot Pushing	VideoGPT	Train	15	# 2
Video Generation	UCF-101 16 frames, 128x128, Unconditional	VideoGPT	Inception Score	24.69	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/videogpt-video-generation-using-vq-vae-and/video-generation-on-ucf-101-16-frames-128x128)](https://paperswithcode.com/sota/video-generation-on-ucf-101-16-frames-128x128?p=videogpt-video-generation-using-vq-vae-and)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/videogpt-video-generation-using-vq-vae-and/video-generation-on-bair-robot-pushing)](https://paperswithcode.com/sota/video-generation-on-bair-robot-pushing?p=videogpt-video-generation-using-vq-vae-and)`

VideoGPT: Video Generation using VQ-VAE and Transformers

20 Apr 2021 · Wilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas ·

We present VideoGPT: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos. VideoGPT uses VQ-VAE that learns downsampled discrete latent representations of a raw video by employing 3D convolutions and axial self-attention. A simple GPT-like architecture is then used to autoregressively model the discrete latents using spatio-temporal position encodings. Despite the simplicity in formulation and ease of training, our architecture is able to generate samples competitive with state-of-the-art GAN models for video generation on the BAIR Robot dataset, and generate high fidelity natural videos from UCF-101 and Tumbler GIF Dataset (TGIF). We hope our proposed architecture serves as a reproducible reference for a minimalistic implementation of transformer based video generation models. Samples and code are available at https://wilson1yan.github.io/videogpt/index.html

PDF Abstract

Code

Add Remove Mark official

wilson1yan/VideoGPT official

↳ Quickstart in

Colab

Spaces

870

Alescontrela/viper_rl

alescontrela/viper

Tasks

Add Remove

Position

Video Generation

Datasets

UCF101

Moving MNIST

VizDoom

TGIF BAIR Robot Pushing

Results from the Paper

Edit

Ranked #3 on Video Generation on UCF-101 16 frames, 128x128, Unconditional

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Generation	BAIR Robot Pushing	VideoGPT	FVD score	103.3	# 9	Compare
			Cond	1	# 1	Compare
			Pred	15	# 8	Compare
			Train	15	# 2	Compare
Video Generation	UCF-101 16 frames, 128x128, Unconditional	VideoGPT	Inception Score	24.69	# 3	Compare

Methods

Add Remove

VQ-VAE

Edit Social Preview

VideoGPT: Video Generation using VQ-VAE and Transformers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove