Adversarial Video Generation on Complex Datasets

15 Jul 2019  ·  Aidan Clark, Jeff Donahue, Karen Simonyan ·

Generative models of natural images have progressed towards high fidelity samples by the strong leveraging of scale. We attempt to carry this success to the field of video modeling by showing that large Generative Adversarial Networks trained on the complex Kinetics-600 dataset are able to produce video samples of substantially higher complexity and fidelity than previous work. Our proposed model, Dual Video Discriminator GAN (DVD-GAN), scales to longer and higher resolution videos by leveraging a computationally efficient decomposition of its discriminator. We evaluate on the related tasks of video synthesis and video prediction, and achieve new state-of-the-art Fr\'echet Inception Distance for prediction for Kinetics-600, as well as state-of-the-art Inception Score for synthesis on the UCF-101 dataset, alongside establishing a strong baseline for synthesis on Kinetics-600.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Video Prediction BAIR Robot Pushing DVD-GAN-FP FVD 109.8 # 5
Video Generation BAIR Robot Pushing DVD-GAN-FP FVD score 109.8 # 11
Cond 1 # 1
Pred 15 # 8
Train 15 # 2
Video Generation Kinetics-600 12 frames, 128x128 DVD-GAN FID 2.16 # 1
Video Generation Kinetics-600 12 frames, 64x64 DVD-GAN FVD 31.1 # 4
Video Prediction Kinetics-600 12 frames, 64x64 DVD-GAN-FP FVD 69.15±0.78 # 11
Cond 5 # 2
Pred 11 # 2
Video Generation Kinetics-600 48 frames, 64x64 DVD-GAN FID 12.92 # 1
Inception Score 219.05 # 1