The method is used to train a hand keypoint detector for single images.
We address two problems: first, we establish an easy method for capturing and labeling 3D keypoints on desktop objects with an RGB camera; and second, we develop a deep neural network, called $KeyPose$, that learns to accurately predict object poses using 3D keypoints, from stereo input, and works even for transparent objects.
The first stage is a convolutional neural network (CNN) that estimates 2D and 3D pose features along with identity assignments for all visible joints of all individuals. We contribute a new architecture for this CNN, called SelecSLS Net, that uses novel selective long and short range skip connections to improve the information flow allowing for a drastically faster network without compromising accuracy.
Ranked #2 on
3D Multi-Person Pose Estimation
on MuPoTS-3D
In this paper, we apply multiscale area attention in a deep convolutional neural network to attend emotional characteristics with varied granularities and therefore the classifier can benefit from an ensemble of attentions with different scales.
Human motion is fundamental to understanding behavior.
Ranked #5 on
3D Human Pose Estimation
on 3DPW
(using extra training data)
3D POSE ESTIMATION 3D SHAPE RECONSTRUCTION MONOCULAR 3D HUMAN POSE ESTIMATION MOTION CAPTURE
We use spatially-sparse two, three and four dimensional convolutional autoencoder networks to model sparse structures in 2D space, 3D space, and 3+1=4 dimensional space-time.
To construct FrankMocap, we build the state-of-the-art monocular 3D "hand" motion capture method by taking the hand part of the whole body parametric model (SMPL-X).
3D HAND POSE ESTIMATION 3D HUMAN RECONSTRUCTION 3D POSE ESTIMATION MOTION CAPTURE
The adjoint sensitivity method scalably computes gradients of solutions to ordinary differential equations.
Ranked #1 on
Video Prediction
on CMU Mocap-2
In other words, our operators form the building blocks of a new deep motion processing framework that embeds the motion into a common latent space, shared by a collection of homeomorphic skeletons.
HIERARCHICAL STRUCTURE MOTION CAPTURE MOTION RETARGETING MOTION SYNTHESIS
We present a novel method for monocular hand shape and pose estimation at unprecedented runtime performance of 100fps and at state-of-the-art accuracy.