Contact-conditioned hand-held object reconstruction from single-view images

Reconstructing the shape of hand-held objects from single-view color images is a long-standing problem in computer vision and computer graphics. The task is complicated by the ill-posed nature of single-view reconstruction, as well as potential occlusions due to both the hand and the object. Previous works mostly handled the problem by utilizing known object templates as priors to reduce the complexity. In contrast, our paper proposes a novel approach without knowing the object templates beforehand but by exploiting prior knowledge of contacts in hand-object interactions to train an attention-based network that can perform precise hand-held object reconstructions with only a single forward pass in inference. The network we propose encodes visual features together with contact features using a multi-head attention module as a way to condition the training of a neural field representation. This neural field representation outputs a Signed Distance Field representing the reconstructed object and extensive experiments on three well-known datasets demonstrate that our method achieves superior reconstruction results even under severe occlusion compared to the state-of-the-art techniques.

PDF

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods