TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Human-Object Interaction Anticipation	VidHOI	ST-GAZE	Person-wise Top5: t=1(mAP@0.5)	37.59	# 2
Human-Object Interaction Anticipation	VidHOI	ST-GAZE	Person-wise Top5: t=3(mAP@0.5)	33.14	# 2
Human-Object Interaction Anticipation	VidHOI	ST-GAZE	Person-wise Top5: t=5(mAP@0.5)	32.75	# 2
Human-Object Interaction Detection	VidHOI	ST-GAZE	Oracle: Full (mAP@0.5)	38.61	# 2
Human-Object Interaction Detection	VidHOI	ST-GAZE	Oracle: Rare (mAP@0.5)	27.99	# 2
Human-Object Interaction Detection	VidHOI	ST-GAZE	Oracle: Non-Rare (mAP@0.5)	52.44	# 2
Human-Object Interaction Detection	VidHOI	ST-GAZE	Detection: Full (mAP@0.5)	10.4	# 2
Human-Object Interaction Detection	VidHOI	ST-GAZE	Detection: Non-Rare (mAP@0.5)	16.83	# 2
Human-Object Interaction Detection	VidHOI	ST-GAZE	Detection: Rare (mAP@0.5)	5.46	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/human-object-interaction-prediction-in-videos/human-object-interaction-anticipation-on)](https://paperswithcode.com/sota/human-object-interaction-anticipation-on?p=human-object-interaction-prediction-in-videos)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/human-object-interaction-prediction-in-videos/human-object-interaction-detection-on-vidhoi)](https://paperswithcode.com/sota/human-object-interaction-detection-on-vidhoi?p=human-object-interaction-prediction-in-videos)`

Human-Object Interaction Prediction in Videos through Gaze Following

6 Jun 2023 · Zhifan Ni, Esteve Valls Mascaró, Hyemin Ahn, Dongheui Lee ·

Understanding the human-object interactions (HOIs) from a video is essential to fully comprehend a visual scene. This line of research has been addressed by detecting HOIs from images and lately from videos. However, the video-based HOI anticipation task in the third-person view remains understudied. In this paper, we design a framework to detect current HOIs and anticipate future HOIs in videos. We propose to leverage human gaze information since people often fixate on an object before interacting with it. These gaze features together with the scene contexts and the visual appearances of human-object pairs are fused through a spatio-temporal transformer. To evaluate the model in the HOI anticipation task in a multi-person scenario, we propose a set of person-wise multi-label metrics. Our model is trained and validated on the VidHOI dataset, which contains videos capturing daily life and is currently the largest video HOI dataset. Experimental results in the HOI detection task show that our approach improves the baseline by a great margin of 36.3% relatively. Moreover, we conduct an extensive ablation study to demonstrate the effectiveness of our modifications and extensions to the spatio-temporal transformer. Our code is publicly available on https://github.com/nizhf/hoi-prediction-gaze-transformer.

PDF Abstract

Code

Add Remove Mark official

nizhf/hoi-prediction-gaze-transform… official

Tasks

Add Remove

Human-Object Interaction Anticipation

Human-Object Interaction Detection

Object

Datasets

VidHOI

Results from the Paper

Edit

Ranked #2 on Human-Object Interaction Anticipation on VidHOI

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Human-Object Interaction Anticipation	VidHOI	ST-GAZE	Person-wise Top5: t=1(mAP@0.5)	37.59	# 2	Compare
			Person-wise Top5: t=3(mAP@0.5)	33.14	# 2	Compare
			Person-wise Top5: t=5(mAP@0.5)	32.75	# 2	Compare
Human-Object Interaction Detection	VidHOI	ST-GAZE	Oracle: Full (mAP@0.5)	38.61	# 2	Compare
			Oracle: Rare (mAP@0.5)	27.99	# 2	Compare
			Oracle: Non-Rare (mAP@0.5)	52.44	# 2	Compare
			Detection: Full (mAP@0.5)	10.4	# 2	Compare
			Detection: Non-Rare (mAP@0.5)	16.83	# 2	Compare
			Detection: Rare (mAP@0.5)	5.46	# 2	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Human-Object Interaction Prediction in Videos through Gaze Following

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove