Stochastic Video Generation with a Learned Prior
Generating video frames that accurately predict future world states is challenging. Existing approaches either fail to capture the full distribution of outcomes, or yield blurry generations, or both. In this paper we introduce an unsupervised video generation model that learns a prior model of uncertainty in a given environment. Video frames are generated by drawing samples from this prior and combining them with a deterministic estimate of the future frame. The approach is simple and easily trained end-to-end on a variety of datasets. Sample generations are both varied and sharp, even many frames into the future, and compare favorably to those from existing approaches.
PDF Abstract ICML 2018 PDF ICML 2018 AbstractCode
Datasets
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Video Generation | BAIR Robot Pushing | SVG-LP (from vRNN) | FVD score | 256.62 | # 24 | |
Cond | 2 | # 13 | ||||
SSIM | 0.816±0.07 | # 8 | ||||
LPIPS | 0.061±0.03 | # 5 | ||||
Pred | 28 | # 20 | ||||
Train | 10 | # 23 | ||||
Video Generation | BAIR Robot Pushing | SVG (from SRVP) | FVD score | 255±4 | # 23 | |
Cond | 2 | # 13 | ||||
SSIM | 0.8058±0.0088 | # 10 | ||||
PSNR | 18.95±0.26 | # 7 | ||||
LPIPS | 0.0609±0.0034 | # 6 | ||||
Pred | 28 | # 20 | ||||
Train | 12 | # 18 | ||||
Video Generation | BAIR Robot Pushing | SVG-FP (from FVD) | FVD score | 315.5 | # 27 | |
Cond | 2 | # 13 | ||||
Pred | 14 | # 2 | ||||
Train | 14 | # 12 | ||||
Video Prediction | Cityscapes 128x128 | SVG (from Hier-VRNN) | FVD | 1300.26 | # 3 | |
SSIM | 0.574±0.08 | # 5 | ||||
LPIPS | 0.549 ± 0.06 | # 5 | ||||
Cond. | 2 | # 1 | ||||
Pred | 28 | # 3 | ||||
Train | 10 | # 1 | ||||
Video Prediction | KTH | SVG-LP (from SRVP) | LPIPS | 0.0923±0.0038 | # 5 | |
PSNR | 28.06±0.29 | # 9 | ||||
FVD | 377 ± 6 | # 10 | ||||
SSIM | 0.8438±0.0054 | # 10 | ||||
Cond | 10 | # 1 | ||||
Pred | 30 | # 17 | ||||
Train | 10 | # 1 | ||||
Video Prediction | SynpickVP | SVG-Det | MSE | 60.60 | # 5 | |
PSNR | 26.92 | # 3 | ||||
SSIM | 0.879 | # 4 | ||||
LPIPS | 0.068 | # 5 | ||||
Video Prediction | SynpickVP | SVG-LP | MSE | 51.82 | # 2 | |
PSNR | 27..38 | # 5 | ||||
SSIM | 0.886 | # 2 | ||||
LPIPS | 0.066 | # 4 |