Generating Videos with Scene Dynamics

NeurIPS 2016 Carl VondrickHamed PirsiavashAntonio Torralba

We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e.g. action classification) and video generation tasks (e.g. future prediction). We propose a generative adversarial network for video with a spatio-temporal convolutional architecture that untangles the scene's foreground from the background... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT LEADERBOARD
Self-Supervised Action Recognition UCF101 VideoGan (C3D) 3-fold Accuracy 52.1 # 14
Video Generation UCF-101 16 frames, 64x64, Unconditional VGAN Inception Score 8.18 # 5
Video Generation UCF-101 16 frames, Unconditional, Single GPU VGAN Inception Score 8.18 # 5