Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition

Skeleton-based human action recognition has attracted great interest thanks to the easy accessibility of the human skeleton data. Recently, there is a trend of using very deep feedforward neural networks to model the 3D coordinates of joints without considering the computational efficiency. In this paper, we propose a simple yet effective semantics-guided neural network (SGN) for skeleton-based action recognition. We explicitly introduce the high level semantics of joints (joint type and frame index) into the network to enhance the feature representation capability. In addition, we exploit the relationship of joints hierarchically through two modules, i.e., a joint-level module for modeling the correlations of joints in the same frame and a framelevel module for modeling the dependencies of frames by taking the joints in the same frame as a whole. A strong baseline is proposed to facilitate the study of this field. With an order of magnitude smaller model size than most previous works, SGN achieves the state-of-the-art performance on the NTU60, NTU120, and SYSU datasets. The source code is available at

PDF Abstract CVPR 2020 PDF CVPR 2020 Abstract


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Skeleton Based Action Recognition NTU RGB+D SGN Accuracy (CV) 93.4 # 60
Accuracy (CS) 86.6 # 60
Skeleton Based Action Recognition N-UCLA SGN Accuracy 92.5% # 14
Skeleton Based Action Recognition SYSU 3D SGN Accuracy 86.9% # 1


No methods listed for this paper. Add relevant methods here