The NVIDIA HOPE datasets consist of RGBD images and video sequences with labeled 6-DoF poses for 28 toy grocery objects. The toy grocery objects are readily available for purchase and have ideal size and weight for robotic manipulation. 3D textured meshes for generating synthetic training data are provided.
The HOPE-Image dataset shows the objects in 50 scenes from 10 household/office environments, and contains 188 test images taken in 8 environments, with a total of 40 scenes (unique camera and object poses). Up to 5 lighting variations are captured for each scene, including backlighting and angled direct lighting with cast shadows. Scenes are cluttered with varying levels of occlusion.
An additional 50 validation images are included from 2 environments in 10 scene arrangements.
Within each scene, up to 5 lighting variations are captured with the same camera and object poses. For example, the captures in
valid/scene_0000/*.json all depict the same camera pose and arrangement of objects, but each individual capture (0000.json, 0001.json, ...) has a different lighting condition. For this reason, each image should be treated independently for purposes of pose prediction. The most favorable lighting condition for each scene is found in
Images were captured using a RealSense D415 RGBD camera. Systematic errors were observed in the depth values relative to the estimated distance of a calibration grid. To correct for this, depth frames are scaled by a factor of 0.98042517 before registering to RGB. Annotations were made manually using these corrected RGBD frames.
NOTE: Only validation set annotations are included. Test annotations are managed by the BOP challenge.