Video Generation
248 papers with code • 15 benchmarks • 14 datasets
( Various Video Generation Tasks. Gif credit: MaGViT )
Libraries
Use these libraries to find Video Generation models and implementationsDatasets
Latest papers
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual environments to decision-making systems.
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
This module converts the generated sequence of images into videos with smooth transitions and consistent subjects that are significantly more stable than the modules based on latent spaces only, especially in the context of long video generation.
FlexiFilm: Long Video Generation with Flexible Conditions
Generating long and consistent videos has emerged as a significant yet challenging problem.
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models
To guide video generation with the additional image input, we propose a "repeat-and-slide" strategy that modulates the reverse denoising process, allowing the frozen diffusion model to synthesize a video frame-by-frame starting from the provided image.
Synthesizing Audio from Silent Video using Sequence to Sequence Modeling
Generating audio from a video's visual context has multiple practical applications in improving how we interact with audio-visual media - for example, enhancing CCTV footage analysis, restoring historical videos (e. g., silent movies), and improving video generation models.
ID-Animator: Zero-Shot Identity-Preserving Human Video Generation
Based on this pipeline, a random face reference training method is further devised to precisely capture the ID-relevant embeddings from reference images, thus improving the fidelity and generalization capacity of our model for ID-specific video generation.
TAVGBench: Benchmarking Text to Audible-Video Generation
To support research in this field, we have developed a comprehensive Text to Audible-Video Generation Benchmark (TAVGBench), which contains over 1. 7 million clips with a total duration of 11. 8 thousand hours.
On the Content Bias in Fréchet Video Distance
We show that FVD with features extracted from the recent large-scale self-supervised video models is less biased toward image quality.
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
Recent advances in Text-to-Video generation (T2V) have achieved remarkable success in synthesizing high-quality general videos from textual descriptions.
CameraCtrl: Enabling Camera Control for Text-to-Video Generation
Controllability plays a crucial role in video generation since it allows users to create desired content.