|Trend||Dataset||Best Method||Paper title||Paper||Code||Compare|
Our hypothesis is that the appearance of a person -- their pose, clothing, action -- is a powerful cue for localizing the objects they are interacting with.
#3 best model for Human-Object Interaction Detection on HICO-DET
Temporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species.
#3 best model for Action Classification on Moments in Time
For a given scene, GPNN infers a parse graph that includes i) the HOI graph structure represented by an adjacency matrix, and ii) the node labels.
Models need to distinguish different human instances in the image panel and learn rich features to represent the details of each instance.
SOTA for Pose Estimation on DensePose-COCO
On account of the generalization of interactiveness, interactiveness network is a transferable knowledge learner and can be cooperated with any HOI detection models to achieve desirable results.
Skeleton-based action recognition has made great progress recently, but many problems still remain unsolved.
Our core idea is that the appearance of a person or an object instance contains informative cues on which relevant parts of an image to attend to for facilitating interaction prediction.
We show that with an appropriate factorization, and encodings of layout and appearance constructed from outputs of pretrained object detectors, a relatively simple model outperforms more sophisticated approaches on human-object interaction detection.