Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

The paucity of videos in current action classification datasets (UCF-101 and HMDB-51) has made it difficult to identify good video architectures, as most methods obtain similar performance on existing small-scale benchmarks. This paper re-evaluates state-of-the-art architectures in light of the new Kinetics Human Action Video dataset... (read more)

PDF Abstract CVPR 2017 PDF CVPR 2017 Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Action Classification Charades I3D MAP 32.9 # 11
Hand Gesture Recognition EgoGesture I3D Accuracy 92.78 # 3
Action Recognition HMDB-51 Two-stream I3D Average accuracy of 3 splits 80.9 # 10
Skeleton Based Action Recognition J-HMDB I3D Accuracy (RGB+pose) 84.1 # 4
Action Classification Moments in Time I3D Top 1 Accuracy 29.51% # 4
Top 5 Accuracy 56.06% # 3
Action Recognition UCF101 Two-stream I3D (on pre-trained) 3-fold Accuracy 98.0 # 5
Action Recognition UCF101 Two-stream I3D 3-fold Accuracy 93.4 # 21
Hand Gesture Recognition VIVA Hand Gestures Dataset I3D Accuracy 83.1 # 2

Results from Other Papers


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK SOURCE PAPER COMPARE
Action Classification Kinetics-400 I3D Top 1 Accuracy 71.1 # 34
Top 5 Accuracy 89.3 # 14
Semantic Object Interaction Classification VLOG I3D MAP 39.7 # 3

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet