Video Representation Learning by Dense Predictive Coding

10 Sep 2019Tengda HanWeidi XieAndrew Zisserman

The objective of this paper is self-supervised learning of spatio-temporal embeddings from video, suitable for human action recognition. We make three contributions: First, we introduce the Dense Predictive Coding (DPC) framework for self-supervised representation learning on videos... (read more)

PDF Abstract

Results from the Paper


 SOTA for Self-Supervised Action Recognition on UCF101 (using extra training data)

     Get a GitHub badge
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK USES EXTRA
TRAINING DATA
RESULT LEADERBOARD
Self-Supervised Action Recognition UCF101 DPC (3D ResNet-34) 3-fold Accuracy 75.7 # 1
Self-Supervised Action Recognition UCF101 DPC (3D ResNet-18) 3-fold Accuracy 60.6 # 9
Self-Supervised Action Recognition UCF101 DPC (3D ResNet-18, extra training data) 3-fold Accuracy 68.2 # 2