FreiHAND is a 3D hand pose dataset which records different hand actions performed by 32 people. For each hand image, MANO-based 3D hand pose annotations are provided. It currently contains 32,560 unique training samples and 3960 unique samples for evaluation. The training samples are recorded with a green screen background allowing for background removal. In addition, it applies three different post processing strategies to training samples for data augmentation. However, these post processing strategies are not applied to evaluation samples.
120 PAPERS • 1 BENCHMARK
DexYCB is a dataset for capturing hand grasping of objects. It can be used three relevant tasks: 2D object and keypoint detection, 6D object pose estimation, and 3D hand pose estimation.
90 PAPERS • 2 BENCHMARKS
AGORA is a synthetic human dataset with high realism and accurate ground truth. It consists of around 14K training and 3K test images by rendering between 5 and 15 people per image using either image-based lighting or rendered 3D environments, taking care to make the images physically plausible and photoreal. In total, AGORA contains 173K individual person crops. AGORA provides (1) SMPL/SMPL-X parameters and (2) segmentation masks for each subject in images.
67 PAPERS • 4 BENCHMARKS
The InterHand2.6M dataset is a large-scale real-captured dataset with accurate GT 3D interacting hand poses, used for 3D hand pose estimation The dataset contains 2.6M labeled single and interacting hand frames.
50 PAPERS • 2 BENCHMARKS
A hand-object interaction dataset with 3D pose annotations of hand and object. The dataset contains 66,034 training images and 11,524 test images from a total of 68 sequences. The sequences are captured in multi-camera and single-camera setups and contain 10 different subjects manipulating 10 different objects from YCB dataset. The annotations are automatically obtained using an optimization algorithm. The hand pose annotations for the test set are withheld and the accuracy of the algorithms on the test set can be evaluated with standard metrics using the CodaLab challenge submission(see project page). The object pose annotations for the test and train set are provided along with the dataset.
35 PAPERS • 2 BENCHMARKS
Curates a dataset of SMPL-X fits on in-the-wild images.
31 PAPERS • NO BENCHMARKS YET
3D Hand Pose is a multi-view hand pose dataset consisting of color images of hands and different kind of annotations for each: the bounding box and the 2D and 3D location on the joints in the hand.
16 PAPERS • NO BENCHMARKS YET
First-Person Hand Action Benchmark is a collection of RGB-D video sequences comprised of more than 100K frames of 45 daily hand action categories, involving 26 different objects in several hand configurations.
15 PAPERS • 2 BENCHMARKS
We present a comprehensive framework for egocentric interaction recognition using markerless 3D annotations of two hands manipulating objects. To this end, we propose a method to create a unified dataset for egocentric 3D interaction recognition. Our method produces annotations of the 3D pose of two hands and the 6D pose of the manipulated objects, along with their interaction labels for each frame. Our dataset, called H2O (2 Hands and Objects), provides synchronized multi-view RGB-D images, interaction labels, object classes, ground-truth 3D poses for left & right hands, 6D object poses, ground-truth camera poses, object meshes and scene point clouds. To the best of our knowledge, this is the first benchmark that enables the study of first-person actions with the use of the pose of both left and right hands manipulating objects and presents an unprecedented level of detail for egocentric 3D interaction recognition. We further propose the method to predict interaction classes by estima
14 PAPERS • 2 BENCHMARKS
The EgoDexter dataset provides both 2D and 3D pose annotations for 4 testing video sequences with 3190 frames. The videos are recorded with body-mounted camera from egocentric viewpoints and contain cluttered backgrounds, fast camera motion, and complex interactions with various objects. Fingertip positions were manually annotated for 1485 out of 3190 frames.
12 PAPERS • NO BENCHMARKS YET
The HO-3D v3 is the version 3 of the HO-3D dataset with more accurate hand-object poses. HO-3D v3 provides more accurate annotations for both the hand and object poses thus resulting in better estimates of contact regions between the hand and the object. The table below shows the statistics of the HO-3D v2 compared to the HO-3D v3 datasets.
10 PAPERS • 1 BENCHMARK
Human3.6M 3D WholeBody (H3WB) is a large scale dataset with 133 whole-body keypoint annotations on 100K images, made possible by a new multi-view pipeline. It is designed for the three new tasks : i) 3D whole-body pose lifting from 2D complete whole-body pose, ii) 3D whole-body pose lifting from 2D incomplete whole-body pose, iii) 3D whole-body pose estimation from a single RGB image.
8 PAPERS • 3 BENCHMARKS
The HInt dataset is frequently used as a <b>generalizability benchmark</b> for 3D Hand Reconstruction. It features three data subsets: HInt-NewDays, HInt-VISOR and HInt-Ego4D subsets and it aims to complement existing datasets used for training and evaluation 3D hand pose estimation. HInt annotates 2D keypoint locations and occlusion labels for 21 keypoints on the hand. It is built off of 3 existing datasets (Hands23, Epic-Kitchens VISOR, and Ego4D) and provides annotations for images from the three existing datasets.
7 PAPERS • 1 BENCHMARK
The SynthHands dataset is a dataset for hand pose estimation which consists of real captured hand motion retargeted to a virtual hand with natural backgrounds and interactions with different objects. The dataset contains data for male and female hands, both with and without interaction with objects. While the hand and foreground object are synthtically generated using Unity, the motion was obtained from real performances as described in the accompanying paper. In addition, real object textures and background images (depth and color) were used. Ground truth 3D positions are provided for 21 keypoints of the hand.
7 PAPERS • NO BENCHMARKS YET
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
5 PAPERS • NO BENCHMARKS YET
The ICVL dataset is a hand pose estimation dataset that consists of 330K training frames and 2 testing sequences with each 800 frames. The dataset is collected from 10 different subjects with 16 hand joint annotations for each frame.
3 PAPERS • NO BENCHMARKS YET
2 PAPERS • NO BENCHMARKS YET
MuViHand is a dataset for 3D Hand Pose Estimation that consists of multi-view videos of the hand along with ground-truth 3D pose labels. The dataset includes more than 402,000 synthetic hand images available in 4,560 videos. The videos have been simultaneously captured from six different angles with complex backgrounds and random levels of dynamic lighting. The data has been captured from 10 distinct animated subjects using 12 cameras in a semi-circle topology.
A large-scale hand pose dataset, collected using a novel capture method.
1 PAPER • NO BENCHMARKS YET
ThermoHands is the first benchmark dataset specifically designed for egocentric 3D hand pose estimation from thermal images. It addresses the challenges of hand pose estimation in low-light conditions and when the hand is occluded by gloves or other wearables—scenarios where traditional RGB or NIR-based systems struggle.