Hand pose estimation is the task of finding the joints of the hand from an image or set of video frames.
|Trend||Dataset||Best Method||Paper title||Paper||Code||Compare|
This paper focuses on the topic of vision based hand pose estimation from single depth map using convolutional neural network (CNN).
We introduce the concept of normalized diversity which force the model to preserve the normalized pairwise distance between the sparse samples from a latent parametric distribution and their corresponding high-dimensional outputs.
In this paper, we present a HAnd Mesh Recovery (HAMR) framework to tackle the problem of reconstructing the full 3D mesh of a human hand from a single RGB image.
We present the first method to capture the 3D total motion of a target person from a monocular view input.
In this work, we remove this requirement by learning to map from the features of real data to the features of synthetic data mainly using a large amount of synthetic and unlabeled real data.
In this paper, we propose a novel method that seeks to predict the 3d position of the hand using both synthetic and partially-labeled real data.
To exploit this observation, we train a model that -- given input from one view -- estimates a latent representation, which is trained to be predictive for the appearance of the object when captured from another viewpoint.
Official Torch7 implementation of "V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map", CVPR 2018
Specifically, we decompose the pose parameters into a set of per-pixel estimations, i. e., 2D heat maps, 3D heat maps and unit 3D directional vector fields.
SOTA for Hand Pose Estimation on MSRA Hands