Stochastic Latent Residual Video Prediction

Designing video prediction models that account for the inherent uncertainty of the future is challenging. Most works in the literature are based on stochastic image-autoregressive recurrent networks, which raises several performance and applicability issues. An alternative is to use fully latent temporal models which untie frame synthesis and temporal dynamics. However, no such model for stochastic video prediction has been proposed in the literature yet, due to design and training difficulties. In this paper, we overcome these difficulties by introducing a novel stochastic temporal model whose dynamics are governed in a latent space by a residual update rule. This first-order scheme is motivated by discretization schemes of differential equations. It naturally models video dynamics as it allows our simpler, more interpretable, latent model to outperform prior state-of-the-art methods on challenging datasets.

PDF Abstract ICML 2020 PDF

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Video Generation BAIR Robot Pushing SRVP FVD score 162 ± 4 # 21
Cond 2 # 13
SSIM 0.8196±0.0084 # 5
PSNR 19.59±0.27 # 4
LPIPS 0.0574±0.0032 # 9
Pred 28 # 20
Train 12 # 18
Video Prediction Cityscapes 128x128 SRVP SSIM 0.603±0.016 # 4
LPIPS 0.447±0.014 # 4
Cond. 10 # 4
PSNR 20.97±0.43 # 2
Pred 20 # 1
Video Prediction KTH SRVP LPIPS 0.0736±0.0029 # 2
PSNR 29.69±032 # 2
FVD 222 ± 3 # 6
SSIM 0.8697±0.0046 # 6
Cond 10 # 1
Pred 30 # 17
Train 10 # 1
Video Prediction KTH 64x64 cond10 pred30 SRVP FVD 222 # 1

Methods


No methods listed for this paper. Add relevant methods here