HOPE-Image (Household Objects for Pose Estimation)

Introduced by Tyree et al. in 6-DoF Pose Estimation of Household Objects for Robotic Manipulation: An Accessible Dataset and Benchmark

The NVIDIA HOPE datasets consist of RGBD images and video sequences with labeled 6-DoF poses for 28 toy grocery objects. The toy grocery objects are readily available for purchase and have ideal size and weight for robotic manipulation. 3D textured meshes for generating synthetic training data are provided.

The HOPE-Image dataset shows the objects in 50 scenes from 10 household/office environments, and contains 188 test images taken in 8 environments, with a total of 40 scenes (unique camera and object poses). Up to 5 lighting variations are captured for each scene, including backlighting and angled direct lighting with cast shadows. Scenes are cluttered with varying levels of occlusion.

An additional 50 validation images are included from 2 environments in 10 scene arrangements.

Within each scene, up to 5 lighting variations are captured with the same camera and object poses. For example, the captures in valid/scene_0000/*.json all depict the same camera pose and arrangement of objects, but each individual capture (0000.json, 0001.json, ...) has a different lighting condition. For this reason, each image should be treated independently for purposes of pose prediction. The most favorable lighting condition for each scene is found in image 0000.json.

Images were captured using a RealSense D415 RGBD camera. Systematic errors were observed in the depth values relative to the estimated distance of a calibration grid. To correct for this, depth frames are scaled by a factor of 0.98042517 before registering to RGB. Annotations were made manually using these corrected RGBD frames.

NOTE: Only validation set annotations are included. Test annotations are managed by the BOP challenge.


Paper Code Results Date Stars

Dataset Loaders

No data loaders found. You can submit your data loader here.


Similar Datasets


  • Unknown