VidHOI is a video-based human-object interaction detection benchmark. VidHOI is based on VidOR which is densely annotated with all humans and predefined objects showing up in each frame. VidOR is also more challenging as the videos are non-volunteering user-generated and thus jittery at times.
6 PAPERS • 2 BENCHMARKS
Ambiguous-HOI is a challenging dataset containing ambiguous human-object interaction images for HOI detection based on HICO-DET.
2 PAPERS • NO BENCHMARKS YET
MPHOI-72 is a multi-person human-object interaction dataset that can be used for a wide variety of HOI/activity recognition and pose estimation/object tracking tasks. The dataset is challenging due to many body occlusions among the humans and objects. It consists of 72 videos captured from 3 different angles at 30 fps, with totally 26,383 frames and an average length of 12 seconds. It involves 5 humans performing in pairs, 6 object types, 3 activities and 13 sub-activities. The dataset includes color video, depth video, human skeletons, human and object bounding boxes.
1 PAPER • NO BENCHMARKS YET
H²O is an image dataset annotated for Human-to-human-or-object interaction detection. H²O is composed of the images from V-COCO dataset to which are added images which mostly contain interactions between people. The dataset has been introduced in this paper: Orcesi, A., Audigier, R., Toukam, F. P., & Luvison, B. (2021, December). Detecting Human-to-Human-or-Object (H 2 O) Interactions with DIABOLO. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021) (pp. 1-8). IEEE. The annotations were made with Pixano, an opensource, smart annotation tool for computer vision applications: https://pixano.cea.fr/
0 PAPER • NO BENCHMARKS YET