Feedback Graph Convolutional Network for Skeleton-based Action Recognition

17 Mar 2020  ·  Hao Yang, Dan Yan, Li Zhang, Dong Li, YunDa Sun, ShaoDi You, Stephen J. Maybank ·

Skeleton-based action recognition has attracted considerable attention in computer vision since skeleton data is more robust to the dynamic circumstance and complicated background than other modalities. Recently, many researchers have used the Graph Convolutional Network (GCN) to model spatial-temporal features of skeleton sequences by an end-to-end optimization. However, conventional GCNs are feedforward networks which are impossible for low-level layers to access semantic information in the high-level layers. In this paper, we propose a novel network, named Feedback Graph Convolutional Network (FGCN). This is the first work that introduces the feedback mechanism into GCNs and action recognition. Compared with conventional GCNs, FGCN has the following advantages: (1) a multi-stage temporal sampling strategy is designed to extract spatial-temporal features for action recognition in a coarse-to-fine progressive process; (2) A dense connections based Feedback Graph Convolutional Block (FGCB) is proposed to introduce feedback connections into the GCNs. It transmits the high-level semantic features to the low-level layers and flows temporal information stage by stage to progressively model global spatial-temporal features for action recognition; (3) The FGCN model provides early predictions. In the early stages, the model receives partial information about actions. Naturally, its predictions are relatively coarse. The coarse predictions are treated as the prior to guide the feature learning of later stages for a accurate prediction. Extensive experiments on the datasets, NTU-RGB+D, NTU-RGB+D120 and Northwestern-UCLA, demonstrate that the proposed FGCN is effective for action recognition. It achieves the state-of-the-art performance on the three datasets.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Skeleton Based Action Recognition NTU RGB+D FGCN-spatial+FGCN-motion Accuracy (CV) 96.3 # 30
Accuracy (CS) 90.2 # 38
Skeleton Based Action Recognition NTU RGB+D 120 FGCN Accuracy (Cross-Subject) 85.4% # 34
Accuracy (Cross-Setup) 87.4% # 34

Methods