Moving poselets: A discriminative and interpretable skeletal motion representation for action recognition

Given a video or time series of skeleton data, action recognition systems perform classification using cues such as motion, appearance, and pose. For the past decade, actions have been modeled using low-level feature representations such as Bag of Features. More recent work has shown that mid-level representations that model body part movements (e.g., hand moving forward) can be very effective. However, these mid-level features are usually hand-crafted and the dictionary of representative features is learned using ad-hoc heuristics. While automatic feature learning methods such as supervised sparse dictionary learning or neural networks can be applied to learn feature representation and action classifiers jointly, the resulting features are usually uninterpretable. In contrast, our goal is to develop a principled feature learning framework to learn discriminative and interpretable skeletal motion patterns for action recognition. For this purpose, we propose a novel body-part motion based feature called Moving Poselet, which corresponds to a specific body part configuration undergoing a specific movement. We also propose a simple algorithm for jointly learning Moving Poselets and action classifiers. Experiments on MSR Action3D, MSR DailyActivity3D and Berkeley MHAD datasets show that our two-layer model outperforms other two-layer models using hand-crafted features, and achieves results comparable to those of recent multi-layer Hierarchical Recurrent Neural Network (HRNN) models, which use multiple layers of RNN to model the human body hierarchy.

PDF Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Multimodal Activity Recognition MSR Daily Activity3D dataset Moving Poselets Accuracy 74.5 # 6

Methods


No methods listed for this paper. Add relevant methods here