Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition

25 Aug 2017Kensho HaraHirokatsu KataokaYutaka Satoh

Convolutional neural networks with spatio-temporal 3D kernels (3D CNNs) have an ability to directly extract spatio-temporal features from videos for action recognition. Although the 3D kernels tend to overfit because of a large number of their parameters, the 3D CNNs are greatly improved by using recent huge video databases... (read more)

PDF Abstract

Evaluation Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK COMPARE
Action Recognition In Videos EgoGesture C3D Accuracy 92.78 # 3
Hand-Gesture Recognition EgoGesture C3D Accuracy 89.7 # 4
Action Recognition In Videos VIVA Hand Gestures Dataset C3D Accuracy 77.4 # 3