In this dataset, various objects are arranged on a white table. A UR5e robot picks and place a target object specified on the title of the video/image sequence. Videos under auto- folder are collected with automatic operation of the robot. Videos under human- folders are collected with the tele-operation of the robot. Ground-truth tracking bounding boxes are generated with STARK, and when the target exits the camera frame, the bounding box estimation is switched to [-1, -1, -1, -1], indicating target not shown.
1 PAPER • NO BENCHMARKS YET