Human-object interactions (HOIs) are crucial for human-centric scene understanding applications such as human-centric visual generation, AR/VR, and robotics.
In this paper, we propose AniPixel, a novel animatable and generalizable human avatar reconstruction method that leverages pixel-aligned features for body geometry prediction and RGB color blending.
Specifically, when applying the proposed module, it employs a two-stream pipeline during training, i. e., either with or without a BatchFormerV2 module, where the batchformer stream can be removed for testing.
Therefore, the proposed method enables the learning on both known and unknown HOI concepts.
We perform extensive experiments on over ten datasets and the proposed method achieves significant improvements on different data scarcity applications without any bells and whistles, including the tasks of long-tailed recognition, compositional zero-shot learning, domain generalization, and contrastive learning.
Ranked #15 on Long-tail Learning on iNaturalist 2018
The proposed method can thus be used to 1) improve the performance of HOI detection, especially for the HOIs with unseen objects; and 2) infer the affordances of novel objects.
Ranked #2 on Affordance Recognition on HICO-DET(Unknown Concepts)
With the proposed object fabricator, we are able to generate large-scale HOI samples for rare and unseen categories to alleviate the open long-tailed issues in HOI detection.
Ranked #4 on Affordance Recognition on HICO-DET
The integration of decomposition and composition enables VCL to share object and verb features among different HOI samples and images, and to generate new interaction samples and new types of HOI, and thus largely alleviates the long-tail distribution problem and benefits low-shot or zero-shot HOI detection.
Ranked #3 on Affordance Recognition on HICO-DET(Unknown Concepts)