Global Context-Aware Attention LSTM Networks for 3D Action Recognition

CVPR 2017  ·  Jun Liu, Gang Wang, Ping Hu, Ling-Yu Duan, Alex C. Kot ·

Long Short-Term Memory (LSTM) networks have shown superior performance in 3D human action recognition due to their power in modeling the dynamics and dependencies in sequential data. Since not all joints are informative for action analysis and the irrelevant joints often bring a lot of noise, we need to pay more attention to the informative ones. However, original LSTM does not have strong attention capability. Hence we propose a new class of LSTM network, Global Context-Aware Attention LSTM (GCA-LSTM), for 3D action recognition, which is able to selectively focus on the informative joints in the action sequence with the assistance of global contextual information. In order to achieve a reliable attention representation for the action sequence, we further propose a recurrent attention mechanism for our GCA-LSTM network, in which the attention performance is improved iteratively. Experiments show that our end-to-end network can reliably focus on the most informative joints in each frame of the skeleton sequence. Moreover, our network yields state-of-the-art performance on three challenging datasets for 3D action recognition.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Skeleton Based Action Recognition NTU RGB+D GCA-LSTM Accuracy (CV) 84.00 # 101
Accuracy (CS) 76.10 # 105
Skeleton Based Action Recognition NTU RGB+D 120 GCA-LSTM Accuracy (Cross-Subject) 58.3% # 66
Accuracy (Cross-Setup) 59.2% # 65

Results from Other Papers


Task Dataset Model Metric Name Metric Value Rank Source Paper Compare
One-Shot 3D Action Recognition NTU RGB+D 120 Attention Network Accuracy 41.0% # 8
One-Shot 3D Action Recognition NTU RGB+D 120 Fully Connected Accuracy 42.1% # 7

Methods


No methods listed for this paper. Add relevant methods here