Stochastic Adversarial Video Prediction

Being able to predict what may happen in the future requires an in-depth understanding of the physical and causal rules that govern the world. A model that is able to do so has a number of appealing applications, from robotic planning to representation learning. However, learning to predict raw future observations, such as frames in a video, is exceedingly challenging -- the ambiguous nature of the problem can cause a naively designed model to average together possible futures into a single, blurry prediction. Recently, this has been addressed by two distinct approaches: (a) latent variational variable models that explicitly model underlying stochasticity and (b) adversarially-trained models that aim to produce naturalistic images. However, a standard latent variable model can struggle to produce realistic results, and a standard adversarially-trained model underutilizes latent variables and fails to produce diverse predictions. We show that these distinct methods are in fact complementary. Combining the two produces predictions that look more realistic to human raters and better cover the range of possible futures. Our method outperforms prior and concurrent work in these aspects.

PDF Abstract ICLR 2019 PDF ICLR 2019 Abstract

Results from the Paper


 Ranked #1 on Video Prediction on KTH (Cond metric)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Video Generation BAIR Robot Pushing SAVP (from FVD) FVD score 116.4 # 12
Cond 2 # 13
Pred 14 # 2
Train 14 # 12
Video Generation BAIR Robot Pushing SAVP-VAE (from WAM) Cond 2 # 13
SSIM 0.815 # 9
PSNR 19.09 # 6
Pred 28 # 20
Train 14 # 12
Video Generation BAIR Robot Pushing SAVP (from SRVP) FVD score 152±9 # 19
Cond 2 # 13
SSIM 0.7887±0.0092 # 12
PSNR 18.44±0.25 # 8
LPIPS 0.0634±0.0026 # 3
Pred 28 # 20
Train 12 # 18
Video Generation BAIR Robot Pushing SAVP (from vRNN) FVD score 143.43 # 17
Cond 2 # 13
SSIM 0.795±0.07 # 11
LPIPS 0.062±0.03 # 4
Pred 28 # 20
Train 10 # 23
Video Prediction KTH SAVP (from Grid-keypoints) LPIPS 0.126 # 9
PSNR 23.79 # 28
FVD 183.7 # 4
SSIM 0.699 # 30
Cond 10 # 1
Pred 40 # 22
Params (M) 17.6 # 6
Train 10 # 1
Video Prediction KTH SAVP-VAE (from Grid-keypoints) LPIPS 0.116 # 7
PSNR 26.00 # 22
FVD 145.7 # 2
SSIM 0.806 # 17
Cond 10 # 1
Pred 40 # 22
Params (M) 7.3 # 3
Train 10 # 1
Video Prediction KTH SAVP-VAE PSNR 27.77 # 11
SSIM 0.852 # 9
Cond 10 # 1
Pred 20 # 1
Video Prediction KTH SAVP (from SRVP) LPIPS 0.1120±0.0039 # 6
PSNR 26.51±0.29 # 19
FVD 374 ± 3 # 9
SSIM 0.7564±0.0062 # 27
Cond 10 # 1
Pred 30 # 17
Train 10 # 1

Methods


No methods listed for this paper. Add relevant methods here