Zero-Shot Human-Object Interaction Detection

RLIPv2: Fast Scaling of Relational Language-Image Pre-training

jacobyuan7/rlipv2 ICCV 2023

In this paper, we propose RLIPv2, a fast converging model that enables the scaling of relational pre-training to large-scale pseudo-labelled scene graph data.

ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection

yeliudev/ConsNet 14 Aug 2020

We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of <human, action, object> in images.

End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation

mrwu-mac/EoID 1 Apr 2022

Extensive experiments on HICO-Det dataset demonstrate that our model discovers potential interactive pairs and enables the recognition of unseen HOIs.

RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning

NVlabs/RelViT ICLR 2022

This task remains challenging for current deep learning algorithms since it requires addressing three key technical problems jointly: 1) identifying object entities and their properties, 2) inferring semantic relations between pairs of entities, and 3) generalizing to novel object-relation combinations, i. e., systematic generalization.

Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model

IDEA-Research/DiffHOI 20 May 2023

Extensive experiments demonstrate that DiffHOI significantly outperforms the state-of-the-art in regular detection (i. e., 41. 50 mAP) and zero-shot detection.