SHPR-Net: Deep Semantic Hand Pose Regression From Point Clouds

3D hand pose estimation is an essential problem for human computer interaction. Most of the existing depth-based hand pose estimation methods consume 2D depth map or 3D volume via 2D/3D convolutional neural networks (CNNs). In this paper, we propose a deep Semantic Hand Pose Regression network (SHPR-Net) for hand pose estimation from point sets, which consists of two subnetworks: a semantic segmentation subnetwork and a hand pose regression subnetwork. The semantic segmentation network assigns semantic labels for each point in the point set. The pose regression network integrates the semantic priors with both input and late fusion strategy and regresses the final hand pose. Two transformation matrices are learned from the point set and applied to transform the input point cloud and inversely transform the output pose respectively, which makes the SHPR-Net more robust to geometric transformations. Experiments on NYU, ICVL and MSRA hand pose datasets demonstrate that our SHPRNet achieves high performance on par with start-of-the-art methods. We also show that our method can be naturally extended to hand pose estimation from multi-view depth data and achieves further improvement on NYU dataset.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Hand Pose Estimation NYU Hands SHPR-Net Average 3D Error 9.37 # 12

Results from Other Papers


Task Dataset Model Metric Name Metric Value Rank Uses Extra
Training Data
Source Paper Compare
Hand Pose Estimation ICVL Hands SHPR-Net Average 3D Error 7.22 # 11
Hand Pose Estimation MSRA Hands SHPR-Net Average 3D Error 7.76 # 7

Methods


No methods listed for this paper. Add relevant methods here