The MECCANO Dataset: Understanding Human-Object Interactions from Egocentric Videos in an Industrial-like Domain

12 Oct 2020  ·  Francesco Ragusa, Antonino Furnari, Salvatore Livatino, Giovanni Maria Farinella ·

Wearable cameras allow to collect images and videos of humans interacting with the world. While human-object interactions have been thoroughly investigated in third person vision, the problem has been understudied in egocentric settings and in industrial scenarios. To fill this gap, we introduce MECCANO, the first dataset of egocentric videos to study human-object interactions in industrial-like settings. MECCANO has been acquired by 20 participants who were asked to build a motorbike model, for which they had to interact with tiny objects and tools. The dataset has been explicitly labeled for the task of recognizing human-object interactions from an egocentric perspective. Specifically, each interaction has been labeled both temporally (with action segments) and spatially (with active object bounding boxes). With the proposed dataset, we investigate four different tasks including 1) action recognition, 2) active object detection, 3) active object recognition and 4) egocentric human-object interaction detection, which is a revisited version of the standard human-object interaction detection task. Baseline results show that the MECCANO dataset is a challenging benchmark to study egocentric human-object interactions in industrial-like scenarios. We publicy release the dataset at

PDF Abstract


Introduced in the Paper:


Used in the Paper:

Kinetics ActivityNet HICO-DET V-COCO EGTEA

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Action Recognition MECCANO SlowFast Top-1 Accuracy 42.85 # 1
Object Recognition MECCANO Faster-RCNN mAP 30.39 # 1
Human-Object Interaction Detection MECCANO SlowFast + FasterRCNN mAP@0.5 role 25.93 # 1


No methods listed for this paper. Add relevant methods here