Recognizing Human Actions as the Evolution of Pose Estimation Maps

CVPR 2018  ·  Mengyuan Liu, Junsong Yuan ·

Most video-based action recognition approaches choose to extract features from the whole video to recognize actions. The cluttered background and non-action motions limit the performances of these methods, since they lack the explicit modeling of human body movements. With recent advances of human pose estimation, this work presents a novel method to recognize human action as the evolution of pose estimation maps. Instead of relying on the inaccurate human poses estimated from videos, we observe that pose estimation maps, the byproduct of pose estimation, preserve richer cues of human body to benefit action recognition. Specifically, the evolution of pose estimation maps can be decomposed as an evolution of heatmaps, e.g., probabilistic maps, and an evolution of estimated 2D human poses, which denote the changes of body shape and body pose, respectively. Considering the sparse property of heatmap, we develop spatial rank pooling to aggregate the evolution of heatmaps as a body shape evolution image. As body shape evolution image does not differentiate body parts, we design body guided sampling to aggregate the evolution of poses as a body pose evolution image. The complementary properties between both types of images are explored by deep convolutional neural networks to predict action label. Experiments on NTU RGB+D, UTD-MHAD and PennAction datasets verify the effectiveness of our method, which outperforms most state-of-the-art methods.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Action Recognition NTU RGB+D PoseMap (RGB+Pose) Accuracy (CS) 91.7 # 12
Accuracy (CV) 95.2 # 9
Action Recognition NTU RGB+D 120 Body Pose Evolution Map Accuracy (Cross-Subject) 66.9 # 10
Accuracy (Cross-Setup) 64.6 # 11
Skeleton Based Action Recognition NTU RGB+D 120 Body Pose Evolution Map Accuracy (Cross-Subject) 64.6% # 36
Accuracy (Cross-Setup) 66.9% # 33
Multimodal Activity Recognition UTD-MHAD PoseMap Accuracy (CS) 94.5 # 1


No methods listed for this paper. Add relevant methods here