Zero-Shot Human-Object Interaction Detection
5 papers with code • 2 benchmarks • 2 datasets
Most implemented papers
RLIPv2: Fast Scaling of Relational Language-Image Pre-training
In this paper, we propose RLIPv2, a fast converging model that enables the scaling of relational pre-training to large-scale pseudo-labelled scene graph data.
ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of <human, action, object> in images.
End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation
Extensive experiments on HICO-Det dataset demonstrate that our model discovers potential interactive pairs and enables the recognition of unseen HOIs.
RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning
This task remains challenging for current deep learning algorithms since it requires addressing three key technical problems jointly: 1) identifying object entities and their properties, 2) inferring semantic relations between pairs of entities, and 3) generalizing to novel object-relation combinations, i. e., systematic generalization.
Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model
Extensive experiments demonstrate that DiffHOI significantly outperforms the state-of-the-art in regular detection (i. e., 41. 50 mAP) and zero-shot detection.