Video Generation

239 papers with code • 15 benchmarks • 14 datasets

( Various Video Generation Tasks. Gif credit: MaGViT )

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Generation

Dataset	Best Model	Compare
UCF-101	W.A.L.T-XL (class-conditional)	See all
BAIR Robot Pushing	MAGVIT	See all
Sky Time-lapse	StyleSV (256x256)	See all
UCF-101 16 frames, 64x64, Unconditional	Make-A-Video (ours) vs. CogVideo (Chinese)	See all
UCF-101 16 frames, Unconditional, Single GPU	TGAN-F	See all
LAION-400M	Imagen original (constant=6)	See all
Taichi	StyleSV (256x256)	See all
UCF-101 16 frames, 128x128, Unconditional	TGANv2 (2020)	See all
Kinetics-600 12 frames, 64x64	W.A.L.T-L	See all
TrailerFaces	PG-SWGAN-3D	See all
Kinetics-600 48 frames, 64x64	DVD-GAN	See all
Kinetics-600 12 frames, 128x128	DVD-GAN	See all
How2Sign	INR-V	See all
YouTube Driving	StyleSV	See all
MSR-VTT	VideoAssembler (Zero-Shot, 256x256, class-conditional)	See all

Show all 15 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Video Generation models and implementations

faceonlive/ai-research

3 papers

124

stability-ai/generative-models

2 papers

22,088

nvlabs/long-video-gan

2 papers

301

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Train Sparsely, Generate Densely: Memory-efficient Unsupervised Training of High-resolution Temporal GAN

pfnet-research/tgan2 • 22 Nov 2018

Training of Generative Adversarial Network (GAN) on a video dataset is a challenge because of the sheer size of the dataset and the complexity of each observation.

Paper
Code

Video Generation from Single Semantic Label Map

junting/seg2vid • • CVPR 2019

This paper proposes the novel task of video generation conditioned on a SINGLE semantic label map, which provides a good balance between flexibility and quality in the generation process.

Paper
Code

DwNet: Dense warp-based network for pose-guided human video generation

UBC-Computer-Vision-Group/DwNet • • 21 Oct 2019

In this paper, we focus on human motion transfer - generation of a video depicting a particular subject, observed in a single image, performing a series of motions exemplified by an auxiliary (driving) video.

Paper
Code

VirtualConductor: Music-driven Conducting Video Generation System

ChenDelong1999/VirtualConductor • • 28 Jul 2021

In this demo, we present VirtualConductor, a system that can generate conducting video from any given music and a single user's image.

Paper
Code

Diffusion Models: A Comprehensive Survey of Methods and Applications

YangLing0818/Diffusion-Models-Papers-Survey-Taxonomy • 2 Sep 2022

This survey aims to provide a contextualized, in-depth look at the state of diffusion models, identifying the key areas of focus and pointing to potential areas for further exploration.

Paper
Code

Make-A-Video: Text-to-Video Generation without Text-Video Data

lucidrains/make-a-video-pytorch • • 29 Sep 2022

We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V).

Paper
Code

Phenaki: Variable Length Video Generation From Open Domain Textual Description

lucidrains/phenaki-pytorch • • 5 Oct 2022

To the best of our knowledge, this is the first time a paper studies generating videos from time variable prompts.

Paper
Code

Scalable Adaptive Computation for Iterative Generation

google-research/pix2seq • • 22 Dec 2022

We show how to leverage recurrence by conditioning the latent tokens at each forward pass of the reverse diffusion process with those from prior computation, i. e. latent self-conditioning.

Paper
Code

MOSO: Decomposing MOtion, Scene and Object for Video Prediction

iva-mzsun/moso • • CVPR 2023

Experimental results demonstrate that our method achieves new state-of-the-art performance on five challenging benchmarks for video prediction and unconditional video generation: BAIR, RoboNet, KTH, KITTI and UCF101.

Paper
Code

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

stability-ai/generative-models • • CVPR 2023

We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i. e., videos.

Paper
Code

Video Generation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result