Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction

28 Nov 2018Longlong JingXiaodong YangJingen LiuYingli Tian

The success of deep neural networks generally requires a vast amount of training data to be labeled, which is expensive and unfeasible in scale, especially for video collections. To alleviate this problem, in this paper, we propose 3DRotNet: a fully self-supervised approach to learn spatiotemporal features from unlabeled videos... (read more)

PDF Abstract

Results from the Paper


#8 best model for Self-Supervised Action Recognition on UCF101 (using extra training data)

     Get a GitHub badge
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK USES EXTRA
TRAINING DATA
RESULT LEADERBOARD
Self-Supervised Action Recognition UCF101 3D RotNet (3D ResNet-18) 3-fold Accuracy 62.9 # 8