NTU RGB+D

Introduced by Shahroudy et al. in NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis

NTU RGB+D is a large-scale dataset for RGB-D human action recognition. It involves 56,880 samples of 60 action classes collected from 40 subjects. The actions can be generally divided into three categories: 40 daily actions (e.g., drinking, eating, reading), nine health-related actions (e.g., sneezing, staggering, falling down), and 11 mutual actions (e.g., punching, kicking, hugging). These actions take place under 17 different scene conditions corresponding to 17 video sequences (i.e., S001–S017). The actions were captured using three cameras with different horizontal imaging viewpoints, namely, −45∘,0∘, and +45∘. Multi-modality information is provided for action characterization, including depth maps, 3D skeleton joint position, RGB frames, and infrared sequences. The performance evaluation is performed by a cross-subject test that split the 40 subjects into training and test groups, and by a cross-view test that employed one camera (+45∘) for testing, and the other two cameras for training.

Source: Action Recognition for Depth Video using Multi-view Dynamic Images

Homepage

Benchmarks

Add a new result Link an existing benchmark

Task	Dataset Variant	Best Model
Skeleton Based Action Recognition	NTU RGB+D	Hulk
Action Recognition	NTU RGB+D	PoseC3D
3D Action Recognition	NTU RGB+D	Kinet
Human Interaction Recognition	NTU RGB+D	SkateFormer
Human action generation	NTU RGB+D	CSGN
Pose Prediction	Filtered NTU RGB+D	PISEP^2
Generalized Zero Shot skeletal action recognition	NTU RGB+D	SynSE
Unsupervised Skeleton Based Action Recognition	NTU RGB+D	BRL
Action Recognition In Videos	NTU RGB+D	2D-3D-Softargmax
Early Action Prediction	NTU RGB+D	TemPr4
Zero Shot Skeletal Action Recognition	NTU RGB+D	SynSE