Video Prediction
192 papers with code • 19 benchmarks • 24 datasets
Video Prediction is the task of predicting future frames given past video frames.
Gif credit: MAGVIT
Source: Photo-Realistic Video Prediction on Natural Videos of Largely Changing Frames
Libraries
Use these libraries to find Video Prediction models and implementationsDatasets
Latest papers with no code
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
Recent advancements in robotics have focused on developing generalist policies capable of performing multiple tasks.
STIV: Scalable Text and Image Conditioned Video Generation
The field of video generation has made remarkable advancements, yet there remains a pressing need for a clear, systematic recipe that can guide the development of robust and scalable models.
Efficient Continuous Video Flow Model for Video Prediction
In this paper, we propose a novel approach to modeling the multi-step process, aimed at alleviating latency constraints and facilitating the adaptation of such processes for video prediction tasks.
Continuous Video Process: Modeling Videos as Continuous Multi-Dimensional Processes for Video Prediction
Diffusion models have made significant strides in image generation, mastering tasks such as unconditional image synthesis, text-image translation, and image-to-image conversions.
Lightweight Stochastic Video Prediction via Hybrid Warping
Accurate video prediction by deep neural networks, especially for dynamic regions, is a challenging task in computer vision for critical applications such as autonomous driving, remote working, and telemedicine.
Distributed solar generation forecasting using attention-based deep neural networks for cloud movement prediction
We investigate and discuss the impact of cloud forecasts from attention-based methods towards forecasting distributed solar generation, compared to cloud forecasts from non-attention-based methods.
Pre-trained Visual Dynamics Representations for Efficient Policy Learning
To address the challenge of pre-training with videos, we propose Pre-trained Visual Dynamics Representations (PVDR) to bridge the domain gap between videos and downstream tasks for efficient policy learning.
Video prediction using score-based conditional density estimation
Furthermore, analysis of networks trained on natural image sequences reveals that the representation automatically weights predictive evidence by its reliability, which is a hallmark of statistical inference
GHIL-Glue: Hierarchical Control with Filtered Subgoal Images
Image and video generative models that are pre-trained on Internet-scale data can greatly increase the generalization capacity of robot learning systems.
Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling
However, existing video prediction approaches typically do not explicitly account for the 3D information from videos, such as robot actions and objects' 3D states, limiting their use in real-world robotic applications.