Human-Object Interaction Detection
132 papers with code • 6 benchmarks • 22 datasets
Human-Object Interaction (HOI) detection is a task of identifying "a set of interactions" in an image, which involves the i) localization of the subject (i.e., humans) and target (i.e., objects) of interaction, and ii) the classification of the interaction labels.
Benchmarks
These leaderboards are used to track progress in Human-Object Interaction Detection
Libraries
Use these libraries to find Human-Object Interaction Detection models and implementationsLatest papers with no code
Generating Human Interaction Motions in Scenes with Text Control
Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model, emphasizing goal-reaching constraints on large-scale motion-capture datasets.
HOI-M3:Capture Multiple Humans and Objects Interaction within Contextual Environment
Humans naturally interact with both others and the surrounding multiple objects, engaging in various social activities.
InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction
However, extending such success to 3D dynamic human-object interaction (HOI) generation faces notable challenges, primarily due to the lack of large-scale interaction data and comprehensive descriptions that align with these interactions.
InterFusion: Text-Driven Generation of 3D Human-Object Interaction
In this study, we tackle the complex task of generating 3D human-object interactions (HOI) from textual descriptions in a zero-shot text-to-3D manner.
FORCE: Dataset and Method for Intuitive Physics Guided Human-object Interaction
Our key insight is that human motion is dictated by the interrelation between the force exerted by the human and the perceived resistance.
THOR: Text to Human-Object Interaction Diffusion via Relation Intervention
This paper addresses new methodologies to deal with the challenging task of generating dynamic Human-Object Interactions from textual descriptions (Text2HOI).
Enhancing Human-Centered Dynamic Scene Understanding via Multiple LLMs Collaborated Reasoning
Human-centered dynamic scene understanding plays a pivotal role in enhancing the capability of robotic and autonomous systems, in which Video-based Human-Object Interaction (V-HOI) detection is a crucial task in semantic scene understanding, aimed at comprehensively understanding HOI relationships within a video to benefit the behavioral decisions of mobile robots and autonomous driving systems.
Towards Zero-shot Human-Object Interaction Detection via Vision-Language Integration
Human-object interaction (HOI) detection aims to locate human-object pairs and identify their interaction categories in images.
Test-time Distribution Learning Adapter for Cross-modal Visual Reasoning
Several approaches aim to efficiently adapt VLP models to downstream tasks with limited supervision, aiming to leverage the acquired knowledge from VLP models.
FreeA: Human-object Interaction Detection using Free Annotation Labels
Recent human-object interaction (HOI) detection approaches rely on high cost of manpower and require comprehensive annotated image datasets.