NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis

CVPR 2016  ·  Amir Shahroudy, Jun Liu, Tian-Tsong Ng, Gang Wang ·

Recent approaches in depth-based human activity analysis achieved outstanding performance and proved the effectiveness of 3D representation for classification of action classes. Currently available depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of training samples, distinct class labels, camera views and variety of subjects. In this paper we introduce a large-scale dataset for RGB+D human action recognition with more than 56 thousand video samples and 4 million frames, collected from 40 distinct subjects. Our dataset contains 60 different action classes including daily, mutual, and health-related actions. In addition, we propose a new recurrent neural network structure to model the long-term temporal correlation of the features for each body part, and utilize them for better action classification. Experimental results show the advantages of applying deep learning methods over state-of-the-art hand-crafted features on the suggested cross-subject and cross-view evaluation criteria for our dataset. The introduction of this large scale dataset will enable the community to apply, develop and adapt various data-hungry learning techniques for the task of depth-based and RGB+D-based human activity analysis.

PDF Abstract CVPR 2016 PDF CVPR 2016 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Skeleton Based Action Recognition CAD-120 P-LSTM (5-shot) Accuracy 68.1% # 8
Skeleton Based Action Recognition NTU RGB+D Deep LSTM Accuracy (CV) 67.3 # 112
Accuracy (CS) 60.7 # 114
Skeleton Based Action Recognition NTU RGB+D Part-aware LSTM Accuracy (CV) 70.27 # 111
Accuracy (CS) 62.93 # 112
Skeleton Based Action Recognition NTU RGB+D 120 Part-Aware LSTM Accuracy (Cross-Subject) 25.5% # 69
Accuracy (Cross-Setup) 26.3% # 68
Skeleton Based Action Recognition Varying-view RGB-D Action-Skeleton P-LSTM Accuracy (CS) 60% # 4
Accuracy (CV I) 13% # 7
Accuracy (CV II) 33% # 6
Accuracy (AV I) 33% # 6
Accuracy (AV II) 50% # 6
Skeleton Based Action Recognition Varying-view RGB-D Action-Skeleton LSTM Accuracy (CS) 56% # 6
Accuracy (CV I) 16% # 4
Accuracy (CV II) 31% # 7
Accuracy (AV I) 31% # 7
Accuracy (AV II) 68% # 3


No methods listed for this paper. Add relevant methods here