Non-Local Graph Convolutional Networks for Skeleton-Based Action Recognition
Traditional deep methods for skeleton-based action recognition usually structure the skeleton as a coordinates sequence or a pseudo-image to feed to RNNs or CNNs, which cannot explicitly exploit the natural connectivity among the joints. Recently, graph convolutional networks (GCNs), which generalize CNNs to more generic non-Euclidean structures, obtains remarkable performance for skeleton-based action recognition. However, the topology of the graph is set by hand and fixed over all layers, which may be not optimal for the action recognition task and the hierarchical CNN structures. Besides, the first-order information (the coordinate of joints) is mainly used in former GCNs, while the second-order information (the length and direction of bones) is less exploited. In this work, a novel two-stream nonlocal graph convolutional network is proposed to solve these problems. The topology of the graph in each layer of the model can be either uniformly or individually learned by BP algorithm, which brings more flexibility and generality. Meanwhile, a two-stream framework is proposed to model both of the joints and bones information simultaneously, which further boost the recognition performance. Extensive experiments on two large-scale datasets, NTU-RGB+D and Kinetics, demonstrate the performance of our model exceeds the state-of-the-art by a significant margin.
PDF Abstract