Skeleton-based Action Recognition with Convolutional Neural Networks

25 Apr 2017  ·  Chao Li, Qiaoyong Zhong, Di Xie, ShiLiang Pu ·

Current state-of-the-art approaches to skeleton-based action recognition are mostly based on recurrent neural networks (RNN). In this paper, we propose a novel convolutional neural networks (CNN) based framework for both action classification and detection. Raw skeleton coordinates as well as skeleton motion are fed directly into CNN for label prediction. A novel skeleton transformer module is designed to rearrange and select important skeleton joints automatically. With a simple 7-layer network, we obtain 89.3% accuracy on validation set of the NTU RGB+D dataset. For action detection in untrimmed videos, we develop a window proposal network to extract temporal segment proposals, which are further classified within the same network. On the recent PKU-MMD dataset, we achieve 93.7% mAP, surpassing the baseline by a large margin.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Skeleton Based Action Recognition NTU RGB+D CNN+Motion+Trans Accuracy (CV) 89.3 # 87
Accuracy (CS) 83.2 # 90
Skeleton Based Action Recognition PKU-MMD Li et al. [[Li et al.2017b]] mAP@0.50 (CV) 93.7 # 3
mAP@0.50 (CS) 90.4 # 3

Methods


No methods listed for this paper. Add relevant methods here