TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Generation	BAIR Robot Pushing	SV2P (from FVD)	FVD score	262.5	# 25
Video Generation	BAIR Robot Pushing	SV2P (from FVD)	Cond	2	# 13
Video Generation	BAIR Robot Pushing	SV2P (from FVD)	Pred	14	# 2
Video Generation	BAIR Robot Pushing	SV2P (from FVD)	Train	14	# 12
Video Generation	BAIR Robot Pushing	SV2P (from SRVP)	FVD score	965±17	# 30
Video Generation	BAIR Robot Pushing	SV2P (from SRVP)	Cond	2	# 13
Video Generation	BAIR Robot Pushing	SV2P (from SRVP)	SSIM	0.8169±0.0086	# 7
Video Generation	BAIR Robot Pushing	SV2P (from SRVP)	PSNR	20.39±0.27	# 2
Video Generation	BAIR Robot Pushing	SV2P (from SRVP)	LPIPS	0.0912±0.0053	# 2
Video Generation	BAIR Robot Pushing	SV2P (from SRVP)	Pred	28	# 20
Video Generation	BAIR Robot Pushing	SV2P (from SRVP)	Train	12	# 18
Video Prediction	KTH	SV2P time-variant (from Grid-keypoints)	LPIPS	0.232	# 15
Video Prediction	KTH	SV2P time-variant (from Grid-keypoints)	PSNR	25.87	# 24
Video Prediction	KTH	SV2P time-variant (from Grid-keypoints)	FVD	209.5	# 5
Video Prediction	KTH	SV2P time-variant (from Grid-keypoints)	SSIM	0.782	# 23
Video Prediction	KTH	SV2P time-variant (from Grid-keypoints)	Cond	10	# 1
Video Prediction	KTH	SV2P time-variant (from Grid-keypoints)	Pred	40	# 22
Video Prediction	KTH	SV2P time-variant (from Grid-keypoints)	Params (M)	8.3	# 4
Video Prediction	KTH	SV2P time-variant (from Grid-keypoints)	Train	10	# 1
Video Prediction	KTH	SV2P time-invariant (from Grid-keypoints)	LPIPS	0.260	# 16
Video Prediction	KTH	SV2P time-invariant (from Grid-keypoints)	PSNR	25.70	# 25
Video Prediction	KTH	SV2P time-invariant (from Grid-keypoints)	FVD	253.5	# 8
Video Prediction	KTH	SV2P time-invariant (from Grid-keypoints)	SSIM	0.772	# 24
Video Prediction	KTH	SV2P time-invariant (from Grid-keypoints)	Cond	10	# 1
Video Prediction	KTH	SV2P time-invariant (from Grid-keypoints)	Pred	40	# 22
Video Prediction	KTH	SV2P time-invariant (from Grid-keypoints)	Params (M)	8.3	# 4
Video Prediction	KTH	SV2P time-invariant (from Grid-keypoints)	Train	10	# 1
Video Prediction	KTH	SV2P (from SRVP)	LPIPS	0.2049±0.0053	# 13
Video Prediction	KTH	SV2P (from SRVP)	PSNR	28.19±0.31	# 8
Video Prediction	KTH	SV2P (from SRVP)	FVD	636 ± 1	# 12
Video Prediction	KTH	SV2P (from SRVP)	SSIM	0.838	# 13
Video Prediction	KTH	SV2P (from SRVP)	Cond	10	# 1
Video Prediction	KTH	SV2P (from SRVP)	Pred	30	# 17
Video Prediction	KTH	SV2P (from SRVP)	Train	10	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/stochastic-variational-video-prediction/video-prediction-on-kth)](https://paperswithcode.com/sota/video-prediction-on-kth?p=stochastic-variational-video-prediction)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/stochastic-variational-video-prediction/video-generation-on-bair-robot-pushing)](https://paperswithcode.com/sota/video-generation-on-bair-robot-pushing?p=stochastic-variational-video-prediction)`

Stochastic Variational Video Prediction

ICLR 2018 · Mohammad Babaeizadeh, Chelsea Finn, Dumitru Erhan, Roy H. Campbell, Sergey Levine ·

Predicting the future in real-world settings, particularly from raw sensory observations such as images, is exceptionally challenging. Real-world events can be stochastic and unpredictable, and the high dimensionality and complexity of natural images requires the predictive model to build an intricate understanding of the natural world. Many existing methods tackle this problem by making simplifying assumptions about the environment. One common assumption is that the outcome is deterministic and there is only one plausible future. This can lead to low-quality predictions in real-world settings with stochastic dynamics. In this paper, we develop a stochastic variational video prediction (SV2P) method that predicts a different possible future for each sample of its latent variables. To the best of our knowledge, our model is the first to provide effective stochastic multi-frame prediction for real-world video. We demonstrate the capability of the proposed method in predicting detailed future frames of videos on multiple real-world datasets, both action-free and action-conditioned. We find that our proposed method produces substantially improved video predictions when compared to the same model without stochasticity, and to other stochastic video prediction methods. Our SV2P implementation will be open sourced upon publication.

PDF Abstract ICLR 2018 PDF ICLR 2018 Abstract

Code

Add Remove Mark official

StanfordVL/roboturk_real_dataset

RoboTurk-Platform/roboturk_real_dat…

suraj-nair-1/google-research

Tasks

Add Remove

Video Generation

Video Prediction

Datasets

KTH BAIR Robot Pushing

Robotic Pushing

Results from the Paper

Edit

Ranked #5 on Video Prediction on KTH

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Generation	BAIR Robot Pushing	SV2P (from FVD)	FVD score	262.5	# 25	Compare
			Cond	2	# 13	Compare
			Pred	14	# 2	Compare
			Train	14	# 12	Compare
Video Generation	BAIR Robot Pushing	SV2P (from SRVP)	FVD score	965±17	# 30	Compare
			Cond	2	# 13	Compare
			SSIM	0.8169±0.0086	# 7	Compare
			PSNR	20.39±0.27	# 2	Compare
			LPIPS	0.0912±0.0053	# 2	Compare
			Pred	28	# 20	Compare
			Train	12	# 18	Compare
Video Prediction	KTH	SV2P time-variant (from Grid-keypoints)	LPIPS	0.232	# 15	Compare
			PSNR	25.87	# 24	Compare
			FVD	209.5	# 5	Compare
			SSIM	0.782	# 23	Compare
			Cond	10	# 1	Compare
			Pred	40	# 22	Compare
			Params (M)	8.3	# 4	Compare
			Train	10	# 1	Compare
Video Prediction	KTH	SV2P time-invariant (from Grid-keypoints)	LPIPS	0.260	# 16	Compare
			PSNR	25.70	# 25	Compare
			FVD	253.5	# 8	Compare
			SSIM	0.772	# 24	Compare
			Cond	10	# 1	Compare
			Pred	40	# 22	Compare
			Params (M)	8.3	# 4	Compare
			Train	10	# 1	Compare
Video Prediction	KTH	SV2P (from SRVP)	LPIPS	0.2049±0.0053	# 13	Compare
			PSNR	28.19±0.31	# 8	Compare
			FVD	636 ± 1	# 12	Compare
			SSIM	0.838	# 13	Compare
			Cond	10	# 1	Compare
			Pred	30	# 17	Compare
			Train	10	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Stochastic Variational Video Prediction

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove