Efficient and Information-Preserving Future Frame Prediction and Beyond

ICLR 2020  ·  Wei Yu, Yichao Lu, Steve Easterbrook, Sanja Fidler ·

Applying resolution-preserving blocks is a common practice to maximize information preservation in video prediction, yet their high memory consumption greatly limits their application scenarios. We propose CrevNet, a Conditionally Reversible Network that uses reversible architectures to build a bijective two-way autoencoder and its complementary recurrent predictor. Our model enjoys the theoretically guaranteed property of no information loss during the feature extraction, much lower memory consumption and computational efficiency. The lightweight nature of our model enables us to incorporate 3D convolutions without concern of memory bottleneck, enhancing the model's ability to capture both short-term and long-term temporal dependencies. Our proposed approach achieves state-of-the-art results on Moving MNIST, Traffic4cast and KITTI datasets. We further demonstrate the transferability of our self-supervised learning method by exploiting its learnt features for object detection on KITTI. Our competitive results indicate the potential of using CrevNet as a generative pre-training strategy to guide downstream tasks.

PDF Abstract

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Video Prediction Moving MNIST CrevNet+ST-LSTM MSE 22.3 # 14
SSIM 0.949 # 11
Video Prediction Moving MNIST CrevNet+ConvLSTM MSE 38.5 # 20
SSIM 0.928 # 15