6 dataset results for Temporal Action Localization AND Images

The MPII Human Pose Dataset for single person pose estimation is composed of about 25K images of which 15K are training samples, 3K are validation samples and 7K are testing samples (which labels are withheld by the authors). The images are taken from YouTube videos covering 410 different human activities and the poses are manually annotated with up to 16 body joints.

464 PAPERS • 4 BENCHMARKS

UTD-MHAD

The UTD-MHAD dataset consists of 27 different actions performed by 8 subjects. Each subject repeated the action for 4 times, resulting in 861 action sequences in total. The RGB, depth, skeleton and the inertial sensor signals were recorded.

55 PAPERS • 2 BENCHMARKS

Florence3D

The dataset collected at the University of Florence during 2012, has been captured using a Kinect camera. It includes 9 activities: wave, drink from a bottle, answer phone,clap, tight lace, sit down, stand up, read watch, bow. During acquisition, 10 subjects were asked to perform the above actions for 2/3 times. This resulted in a total of 215 activity samples.

18 PAPERS • 1 BENCHMARK

TUM Kitchen

The TUM Kitchen dataset is an action recognition dataset that contains 20 video sequences captured by 4 cameras with overlapping views. The camera network captures the scene from four viewpoints with 25 fps, and every RGB frame is of the resolution 384×288 by pixels. The action labels are frame-wise, and provided for the left arm, the right arm and the torso separately.

7 PAPERS • NO BENCHMARKS YET

UAV-GESTURE

UAV-GESTURE is a dataset for UAV control and gesture recognition. It is an outdoor recorded video dataset for UAV commanding signals with 13 gestures suitable for basic UAV navigation and command from general aircraft handling and helicopter handling signals. It contains 119 high-definition video clips consisting of 37,151 frames.

3 PAPERS • NO BENCHMARKS YET

RISE

RISE is a large-scale video dataset for Recognizing Industrial Smoke Emissions. A citizen science approach was adopted to collaborate with local community members to annotate whether a video clip has smoke emissions. The dataset contains 12,567 clips from 19 distinct views from cameras that monitored three industrial facilities. These daytime clips span 30 days over two years, including all four seasons.

2 PAPERS • NO BENCHMARKS YET

Datasets

6 dataset results for Temporal Action Localization AND Images