Aff-Wild is a dataset for emotion recognition from facial images in a variety of head poses, illumination conditions and occlusions.
111 PAPERS • NO BENCHMARKS YET
The Microsoft Research Cambridge-12 Kinect gesture data set consists of sequences of human movements, represented as body-part locations, and the associated gesture to be recognized by the system. The data set includes 594 sequences and 719,359 frames—approximately six hours and 40 minutes—collected from 30 people performing 12 gestures. In total, there are 6,244 gesture instances. The motion files contain tracks of 20 joints estimated using the Kinect Pose Estimation pipeline. The body poses are captured at a sample rate of 30Hz with an accuracy of about two centimeters in joint positions.
31 PAPERS • 2 BENCHMARKS
The IPN Hand dataset is a benchmark video dataset with sufficient size, variation, and real-world elements able to train and evaluate deep neural networks for continuous Hand Gesture Recognition (HGR).
5 PAPERS • NO BENCHMARKS YET
The TCG dataset is used to evaluate Traffic Control Gesture recognition for autonomous driving. The dataset is based on 3D body skeleton input to perform traffic control gesture classification on every time step. The dataset consists of 250 sequences from several actors, ranging from 16 to 90 seconds per sequence.
2 PAPERS • 1 BENCHMARK
Explicitly created for Human Computer Interaction (HCI).
1 PAPER • NO BENCHMARKS YET
MlGesture is a dataset for hand gesture recognition tasks, recorded in a car with 5 different sensor types at two different viewpoints. The dataset contains over 1300 hand gesture videos from 24 participants and features 9 different hand gesture symbols. One sensor cluster with five different cameras is mounted in front of the driver in the center of the dashboard. A second sensor cluster is mounted on the ceiling looking straight down.
We introduce a large-scale video dataset Slovo for Russian Sign Language task. Slovo dataset size is about 16 GB, and it contains 20400 RGB videos for 1000 sign language gestures from 194 singers. Each class has 20 samples. The dataset is divided into training set and test set by subject user_id. The training set includes 15300 videos, and the test set includes 5100 videos. The total video recording time is ~9.2 hours. About 35% of the videos are recorded in HD format, and 65% of the videos are in FullHD resolution. The average video length with gesture is 50 frames.
1 PAPER • 1 BENCHMARK