A Semantics-Guided Graph Convolutional Network for Skeleton-Based Action Recognition

Action recognition with skeleton data is a challenging task in computer vision. Graph convolutional networks (GCNs), which directly model the human body skeletons as the graph structure, have achieved remarkable performance. However, current architectures of GCNs are limited to the small receptive field of convolution filters, only capturing local physical dependencies among joints and using all skeleton data indiscriminately. To address these limitations and to achieve a flexible graph representation of the skeleton features, we propose a novel semantics-guided graph convolutional network (Sem-GCN) for skeleton-based action recognition. Three types of semantic graph modules (structural graph extraction module, actional graph inference module and attention graph iteration module) are employed in Sem-GCN to aggregate L-hop joint neighbors' information, to capture action-specific latent dependencies and to distribute importance level. Combing these semantic graphs into a generalized skeleton graph, we further propose the semantics-guided graph convolution block, which stacks semantic graph convolution and temporal convolution, to learn both semantic and temporal features for action recognition. Experimental results demonstrate the effectiveness of our proposed model on the widely used NTU and Kinetics datasets.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Skeleton Based Action Recognition NTU RGB+D Sem-GCN Accuracy (CV) 94.2 # 56
Accuracy (CS) 86.2 # 71

Methods