Video Prediction
142 papers with code • 14 benchmarks • 17 datasets
Video Prediction is the task of predicting future frames given past video frames.
Source: Photo-Realistic Video Prediction on Natural Videos of Largely Changing Frames
Libraries
Use these libraries to find Video Prediction models and implementationsDatasets
Latest papers with no code
Allo-centric Occupancy Grid Prediction for Urban Traffic Scene Using Video Prediction Networks
This allows for the static scene to remain fixed and to represent motion of the ego-vehicle on the grid like other agents'.
Long-horizon video prediction using a dynamic latent hierarchy
The task of video prediction and generation is known to be notoriously difficult, with the research in this area largely limited to short-term predictions.
Motion and Context-Aware Audio-Visual Conditioned Video Prediction
Existing state-of-the-art method for audio-visual conditioned video prediction uses the latent codes of the audio-visual frames from a multimodal stochastic network and a frame encoder to predict the next visual frame.
MIMO Is All You Need : A Strong Multi-In-Multi-Out Baseline for Video Prediction
The mainstream of the existing approaches for video prediction builds up their models based on a Single-In-Single-Out (SISO) architecture, which takes the current frame as input to predict the next frame in a recursive manner.
Randomized Conditional Flow Matching for Video Prediction
We call our model Random frame conditional flow Integration for VidEo pRediction, or, in short, RIVER.
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
Inspired by this, we introduce a novel task, text-guided video completion (TVC), which requests the model to generate a video from partial frames guided by an instruction.
3D-CSL: self-supervised 3D context similarity learning for Near-Duplicate Video Retrieval
In this paper, we introduce 3D-CSL, a compact pipeline for Near-Duplicate Video Retrieval (NDVR), and explore a novel self-supervised learning strategy for video similarity learning.
SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric Models
While recent object-centric models can successfully decompose a scene into objects, modeling their dynamics effectively still remains a challenge.
Continuous conditional video synthesis by neural processes
We propose a unified model for multiple conditional video synthesis tasks, including video prediction and video frame interpolation.
See, Plan, Predict: Language-guided Cognitive Planning with Video Prediction
Cognitive planning is the structural decomposition of complex tasks into a sequence of future behaviors.