Video-to-image Affordance Grounding
2 papers with code • 3 benchmarks • 2 datasets
Given a demonstration video V and a target image I, the goal of video-to-image affordance grounding predict an affordance heatmap over the target image according to the hand-interacted region in the video, accompanied by the affordance action (e.g., press, turn).
Most implemented papers
Grounded Human-Object Interaction Hotspots from Video
Learning how to interact with objects is an important step towards embodied visual intelligence, but existing techniques suffer from heavy supervision or sensing requirements.
Affordance Grounding from Demonstration Video to Target Image
Humans excel at learning from expert demonstrations and solving their own problems.