DeepActsNet: Spatial and Motion features from Face, Hands, and Body Combined with Convolutional and Graph Networks for Improved Action Recognition

Existing action recognition methods mainly focus on joint and bone information in human body skeleton data due to its robustness to complex backgrounds and dynamic characteristics of the environments. In this paper, we combine body skeleton data with spatial and motion features from face and two hands, and present "Deep Action Stamps (DeepActs)", a novel data representation to encode actions from video sequences. We also present "DeepActsNet", a deep learning based ensemble model which learns convolutional and structural features from Deep Action Stamps for highly accurate action recognition. Experiments on three challenging action recognition datasets (NTU60, NTU120, and SYSU) show that the proposed model trained using Deep Action Stamps produce considerable improvements in the action recognition accuracy with less computational cost compared to the state-of-the-art methods.

Results in Papers With Code
(↓ scroll down to see all results)