Stochastic Video Generation with a Learned Prior

ICML 2018  ·  Emily Denton, Rob Fergus ·

Generating video frames that accurately predict future world states is challenging. Existing approaches either fail to capture the full distribution of outcomes, or yield blurry generations, or both. In this paper we introduce an unsupervised video generation model that learns a prior model of uncertainty in a given environment. Video frames are generated by drawing samples from this prior and combining them with a deterministic estimate of the future frame. The approach is simple and easily trained end-to-end on a variety of datasets. Sample generations are both varied and sharp, even many frames into the future, and compare favorably to those from existing approaches.

PDF Abstract ICML 2018 PDF ICML 2018 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Video Generation BAIR Robot Pushing SVG-LP (from vRNN) FVD score 256.62 # 19
Cond 2 # 10
SSIM 0.816±0.07 # 7
LPIPS 0.061±0.03 # 5
Pred 28 # 16
Train 10 # 21
Video Generation BAIR Robot Pushing SVG (from SRVP) FVD score 255±4 # 18
Cond 2 # 10
SSIM 0.8058±0.0088 # 9
PSNR 18.95±0.26 # 6
LPIPS 0.0609±0.0034 # 6
Pred 28 # 16
Train 12 # 16
Video Generation BAIR Robot Pushing SVG-FP (from FVD) FVD score 315.5 # 22
Cond 2 # 10
Pred 14 # 2
Train 14 # 10
Video Prediction Cityscapes 128x128 SVG (from Hier-VRNN) FVD 1300.26 # 3
SSIM 0.574±0.08 # 5
LPIPS 0.549 ± 0.06 # 5
Cond. 2 # 1
Pred 28 # 3
Train 10 # 1
Video Prediction KTH SVG-LP (from Grid-keypoints) LPIPS 0.129 # 10
PSNR 23.91 # 27
FVD 157.9 # 3
SSIM 0.800 # 20
Cond 10 # 1
Pred 40 # 22
Params (M) 22.8 # 7
Train 10 # 1
Video Prediction KTH SVG-LP (from SRVP) LPIPS 0.0923±0.0038 # 5
PSNR 28.06±0.29 # 9
FVD 377 ± 6 # 10
SSIM 0.8438±0.0054 # 10
Cond 10 # 1
Pred 30 # 17
Train 10 # 1
Video Prediction SynpickVP SVG-LP MSE 51.82 # 2
PSNR 27..38 # 5
SSIM 0.886 # 2
LPIPS 0.066 # 4
Video Prediction SynpickVP SVG-Det MSE 60.60 # 5
PSNR 26.92 # 3
SSIM 0.879 # 4
LPIPS 0.068 # 5

Methods


No methods listed for this paper. Add relevant methods here