Video Generation
239 papers with code • 15 benchmarks • 14 datasets
( Various Video Generation Tasks. Gif credit: MaGViT )
Libraries
Use these libraries to find Video Generation models and implementationsDatasets
Most implemented papers
Train Sparsely, Generate Densely: Memory-efficient Unsupervised Training of High-resolution Temporal GAN
Training of Generative Adversarial Network (GAN) on a video dataset is a challenge because of the sheer size of the dataset and the complexity of each observation.
Video Generation from Single Semantic Label Map
This paper proposes the novel task of video generation conditioned on a SINGLE semantic label map, which provides a good balance between flexibility and quality in the generation process.
DwNet: Dense warp-based network for pose-guided human video generation
In this paper, we focus on human motion transfer - generation of a video depicting a particular subject, observed in a single image, performing a series of motions exemplified by an auxiliary (driving) video.
VirtualConductor: Music-driven Conducting Video Generation System
In this demo, we present VirtualConductor, a system that can generate conducting video from any given music and a single user's image.
Diffusion Models: A Comprehensive Survey of Methods and Applications
This survey aims to provide a contextualized, in-depth look at the state of diffusion models, identifying the key areas of focus and pointing to potential areas for further exploration.
Make-A-Video: Text-to-Video Generation without Text-Video Data
We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V).
Phenaki: Variable Length Video Generation From Open Domain Textual Description
To the best of our knowledge, this is the first time a paper studies generating videos from time variable prompts.
Scalable Adaptive Computation for Iterative Generation
We show how to leverage recurrence by conditioning the latent tokens at each forward pass of the reverse diffusion process with those from prior computation, i. e. latent self-conditioning.
MOSO: Decomposing MOtion, Scene and Object for Video Prediction
Experimental results demonstrate that our method achieves new state-of-the-art performance on five challenging benchmarks for video prediction and unconditional video generation: BAIR, RoboNet, KTH, KITTI and UCF101.
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i. e., videos.