H2ONet: Hand-Occlusion-and-Orientation-Aware Network for Real-Time 3D Hand Mesh Reconstruction
Real-time 3D hand mesh reconstruction is challenging, especially when the hand is holding some object. Beyond the previous methods, we design H2ONet to fully exploit non-occluded information from multiple frames to boost the reconstruction quality. First, we decouple hand mesh reconstruction into two branches, one to exploit finger-level non-occluded information and the other to exploit global hand orientation, with lightweight structures to promote real-time inference. Second, we propose finger-level occlusion-aware feature fusion, leveraging predicted finger-level occlusion information as guidance to fuse finger-level information across time frames. Further, we design hand-level occlusion-aware feature fusion to fetch non-occluded information from nearby time frames. We conduct experiments on the Dex-YCB and HO3D-v2 datasets with challenging hand-object occlusion cases, manifesting that H2ONet is able to run in real-time and achieves state-of-the-art performance on both the hand mesh and pose precision. The code will be released on GitHub.
PDF AbstractTasks
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
3D Hand Pose Estimation | DexYCB | H2ONet | Average MPJPE (mm) | 14.0 | # 5 | |
Procrustes-Aligned MPJPE | 5.70 | # 5 | ||||
MPVPE | 13.0 | # 5 | ||||
VAUC | 76.2 | # 4 | ||||
PA-MPVPE | 5.5 | # 4 | ||||
PA-VAUC | 89.1 | # 3 | ||||
3D Hand Pose Estimation | HO-3D | H2ONet | Average MPJPE (mm) | - | # 10 | |
ST-MPJPE (mm) | 23.0 | # 5 | ||||
PA-MPJPE (mm) | 9.0 | # 4 |