Video-to-image Affordance Grounding

2 papers with code • 3 benchmarks • 2 datasets

Given a demonstration video V and a target image I, the goal of video-to-image affordance grounding predict an affordance heatmap over the target image according to the hand-interacted region in the video, accompanied by the affordance action (e.g., press, turn).

Most implemented papers

Grounded Human-Object Interaction Hotspots from Video

Tushar-N/interaction-hotspots ICCV 2019

Learning how to interact with objects is an important step towards embodied visual intelligence, but existing techniques suffer from heavy supervision or sensing requirements.

Affordance Grounding from Demonstration Video to Target Image

showlab/afformer CVPR 2023

Humans excel at learning from expert demonstrations and solving their own problems.