11 dataset results for Skeleton Based Action Recognition AND Images

The dataset contains 400 human action classes, with at least 400 video clips for each action. Each clip lasts around 10s and is taken from a different YouTube video. The actions are human focussed and cover a broad range of classes including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands.

607 PAPERS • 6 BENCHMARKS

Penn Action

The Penn Action Dataset contains 2326 video sequences of 15 different actions and human joint annotations for each sequence.

102 PAPERS • 4 BENCHMARKS

UT-Kinect (UTKinect-Action3D Dataset)

The UT-Kinect dataset is a dataset for action recognition from depth sequences. The videos were captured using a single stationary Kinect. There are 10 action types: walk, sit down, stand up, pick up, carry, throw, push, pull, wave hands, clap hands. There are 10 subjects, Each subject performs each actions twice. Three channels were recorded: RGB, depth and skeleton joint locations. The three channel are synchronized. The framerate is 30f/s.

70 PAPERS • 2 BENCHMARKS

CAD-120

The CAD-60 and CAD-120 data sets comprise of RGB-D video sequences of humans performing activities which are recording using the Microsoft Kinect sensor. Being able to detect human activities is important for making personal assistant robots useful in performing assistive tasks. The CAD dataset comprises twelve different activities (composed of several sub-activities) performed by four people in different environments, such as a kitchen, a living room, and office, etc.

62 PAPERS • 1 BENCHMARK

PKU-MMD

The PKU-MMD dataset is a large skeleton-based action detection dataset. It contains 1076 long untrimmed video sequences performed by 66 subjects in three camera views. 51 action categories are annotated, resulting almost 20,000 action instances and 5.4 million frames in total. Similar to NTU RGB+D, there are also two recommended evaluate protocols, i.e. cross-subject and cross-view.

49 PAPERS • 3 BENCHMARKS

MSRC-12

MSRC-12 (MSRC-12 Kinect Gesture Dataset)

The Microsoft Research Cambridge-12 Kinect gesture data set consists of sequences of human movements, represented as body-part locations, and the associated gesture to be recognized by the system. The data set includes 594 sequences and 719,359 frames—approximately six hours and 40 minutes—collected from 30 people performing 12 gestures. In total, there are 6,244 gesture instances. The motion files contain tracks of 20 joints estimated using the Kinect Pose Estimation pipeline. The body poses are captured at a sample rate of 30Hz with an accuracy of about two centimeters in joint positions.

31 PAPERS • 2 BENCHMARKS

G3D (Gaming 3D Dataset)

The Gaming 3D Dataset (G3D) focuses on real-time action recognition in a gaming scenario. It contains 10 subjects performing 20 gaming actions: “punch right”, “punch left”, “kick right”, “kick left”, “defend”, “golf swing”, “tennis swing forehand”, “tennis swing backhand”, “tennis serve”, “throw bowling ball”, “aim and fire gun”, “walk”, “run”, “jump”, “climb”, “crouch”, “steer a car”, “wave”, “flap” and “clap”.

27 PAPERS • 2 BENCHMARKS

Florence3D

The dataset collected at the University of Florence during 2012, has been captured using a Kinect camera. It includes 9 activities: wave, drink from a bottle, answer phone,clap, tight lace, sit down, stand up, read watch, bow. During acquisition, 10 subjects were asked to perform the above actions for 2/3 times. This resulted in a total of 215 activity samples.

18 PAPERS • 1 BENCHMARK

HDM05

HDM05 is a MoCap (motion capture) dataset. It contains more than three hours of systematically recorded and well-documented motion capture data in the C3D as well as in the ASF/AMC data format. HDM05 contains almost 2337 sequences with 130 motion classes performed by 5 different actors.

3 PAPERS • 1 BENCHMARK

TCG (Traffic Control Gesture)

The TCG dataset is used to evaluate Traffic Control Gesture recognition for autonomous driving. The dataset is based on 3D body skeleton input to perform traffic control gesture classification on every time step. The dataset consists of 250 sequences from several actors, ranging from 16 to 90 seconds per sequence.

2 PAPERS • 1 BENCHMARK

ANUBIS

ANUBIS (Skeleton-Based Action Recognition Dataset)

ANUBIS is a large-scale human skeleton dataset containing 80 actions. Compared with previously collected datasets, ANUBIS is advantageous in the following four aspects: (1) employing more recently released sensors; (2) containing novel back view; (3) encouraging high enthusiasm of subjects; (4) including actions of the COVID pandemic era.

1 PAPER • NO BENCHMARKS YET

Datasets

11 dataset results for Skeleton Based Action Recognition AND Images