V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map

CVPR 2018 Gyeongsik MoonJu Yong ChangKyoung Mu Lee

Most of the existing deep learning-based methods for 3D hand and human pose estimation from a single depth map are based on a common framework that takes a 2D depth map and directly regresses the 3D coordinates of keypoints, such as hand or human body joints, via 2D convolutional neural networks (CNNs). The first weakness of this approach is the presence of perspective distortion in the 2D depth map... (read more)

PDF Abstract

Evaluation results from the paper


Task Dataset Model Metric name Metric value Global rank Compare
Hand Pose Estimation HANDS 2017 V2V-PoseNet Average 3D Error 9.95 # 1
Hand Pose Estimation ICVL Hands V2V-PoseNet Average 3D Error 6.28 # 1
Pose Estimation ITOP front-view V2V-PoseNet Mean mAP 88.74 # 1
Pose Estimation ITOP top-view V2V-PoseNet Mean mAP 83.44 # 1
Hand Pose Estimation MSRA Hands V2V-PoseNet Average 3D Error 7.49 # 2
Hand Pose Estimation NYU Hands V2V-PoseNet Average 3D Error 8.42 # 1