We propose a probabilistic video model, the Video Pixel Network (VPN), that estimates the discrete joint distribution of the raw pixel values in a video. The model and the neural architecture reflect the time, space and color structure of video tensors and encode it as a four-dimensional dependency chain. The VPN approaches the best possible performance on the Moving MNIST benchmark, a leap over the previous state of the art, and the generated videos show only minor deviations from the ground truth. The VPN also produces detailed samples on the action-conditional Robotic Pushing benchmark and generalizes to the motion of novel objects.

PDF Abstract ICML 2017 PDF ICML 2017 Abstract

Results from the Paper


 Ranked #1 on Video Prediction on KTH (Cond metric)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Video Prediction KTH VPN PSNR 23.76 # 29
SSIM 0.746 # 28
Cond 10 # 1
Pred 20 # 1

Methods


No methods listed for this paper. Add relevant methods here