The Microsoft Research Cambridge-12 Kinect gesture data set consists of sequences of human movements, represented as body-part locations, and the associated gesture to be recognized by the system. The data set includes 594 sequences and 719,359 frames—approximately six hours and 40 minutes—collected from 30 people performing 12 gestures. In total, there are 6,244 gesture instances. The motion files contain tracks of 20 joints estimated using the Kinect Pose Estimation pipeline. The body poses are captured at a sample rate of 30Hz with an accuracy of about two centimeters in joint positions.
31 PAPERS • 2 BENCHMARKS
The Gaming 3D Dataset (G3D) focuses on real-time action recognition in a gaming scenario. It contains 10 subjects performing 20 gaming actions: “punch right”, “punch left”, “kick right”, “kick left”, “defend”, “golf swing”, “tennis swing forehand”, “tennis swing backhand”, “tennis serve”, “throw bowling ball”, “aim and fire gun”, “walk”, “run”, “jump”, “climb”, “crouch”, “steer a car”, “wave”, “flap” and “clap”.
27 PAPERS • 2 BENCHMARKS
The Drive&Act dataset is a state of the art multi modal benchmark for driver behavior recognition. The dataset includes 3D skeletons in addition to frame-wise hierarchical labels of 9.6 Million frames captured by 6 different views and 3 modalities (RGB, IR and depth).
18 PAPERS • 1 BENCHMARK
The dataset collected at the University of Florence during 2012, has been captured using a Kinect camera. It includes 9 activities: wave, drink from a bottle, answer phone,clap, tight lace, sit down, stand up, read watch, bow. During acquisition, 10 subjects were asked to perform the above actions for 2/3 times. This resulted in a total of 215 activity samples.
This is a 3D action recognition dataset, also known as 3D Action Pairs dataset. The actions in this dataset are selected in pairs such that the two actions of each pair are similar in motion (have similar trajectories) and shape (have similar objects); however, the motion-shape relation is different.
7 PAPERS • 1 BENCHMARK