Zero-Shot Human-Object Interaction Detection
7 papers with code • 2 benchmarks • 2 datasets
Most implemented papers
RLIPv2: Fast Scaling of Relational Language-Image Pre-training
In this paper, we propose RLIPv2, a fast converging model that enables the scaling of relational pre-training to large-scale pseudo-labelled scene graph data.
ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of <human, action, object> in images.
End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation
Extensive experiments on HICO-Det dataset demonstrate that our model discovers potential interactive pairs and enables the recognition of unseen HOIs.
RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning
This task remains challenging for current deep learning algorithms since it requires addressing three key technical problems jointly: 1) identifying object entities and their properties, 2) inferring semantic relations between pairs of entities, and 3) generalizing to novel object-relation combinations, i. e., systematic generalization.
Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model
Extensive experiments demonstrate that DiffHOI significantly outperforms the state-of-the-art in regular detection (i. e., 41. 50 mAP) and zero-shot detection.
Boosting Zero-Shot Human-Object Interaction Detection with Vision-Language Transfer
Human-Object Interaction (HOI) detection is a crucial task that involves localizing interactive human-object pairs and identifying the actions being performed.
Unseen No More: Unlocking the Potential of CLIP for Generative Zero-shot HOI Detection
Then, we extract realistic features of seen samples and mix them with synthetic features together, allowing the model to train seen and unseen classes jointly.