Monocular 3D Human Pose Estimation
57 papers with code • 1 benchmarks • 5 datasets
This task targets at 3D human pose estimation with a single RGB camera.
These leaderboards are used to track progress in Monocular 3D Human Pose Estimation
LibrariesUse these libraries to find Monocular 3D Human Pose Estimation models and implementations
Most implemented papers
DensePose: Dense Human Pose Estimation In The Wild
In this work, we establish dense correspondences between RGB image and a surface-based representation of the human body, a task we refer to as dense human pose estimation.
A simple yet effective baseline for 3d human pose estimation
Following the success of deep convolutional networks, state-of-the-art methods for 3d human pose estimation have focused on deep end-to-end systems that predict 3d joint locations given raw image pixels.
Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image
We propose a unified formulation for the problem of 3D human pose estimation from a single raw RGB image that reasons jointly about 2D joint estimation and 3D pose reconstruction to improve both tasks.
3D human pose estimation in video with temporal convolutions and semi-supervised training
We start with predicted 2D keypoints for unlabeled video, then estimate 3D poses and finally back-project to the input 2D keypoints.
End-to-end Recovery of Human Shape and Pose
The main objective is to minimize the reprojection loss of keypoints, which allow our model to be trained using images in-the-wild that only have ground truth 2D annotations.
Towards 3D Human Pose Estimation in the Wild: a Weakly-supervised Approach
We propose a weakly-supervised transfer learning method that uses mixed 2D and 3D labels in a unified deep neutral network that presents two-stage cascaded structure.
VIBE: Video Inference for Human Body Pose and Shape Estimation
Human motion is fundamental to understanding behavior.
Semantic Graph Convolutional Networks for 3D Human Pose Regression
In this paper, we study the problem of learning Graph Convolutional Networks (GCNs) for regression.
XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera
The first stage is a convolutional neural network (CNN) that estimates 2D and 3D pose features along with identity assignments for all visible joints of all individuals. We contribute a new architecture for this CNN, called SelecSLS Net, that uses novel selective long and short range skip connections to improve the information flow allowing for a drastically faster network without compromising accuracy.
Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image
Although significant improvement has been achieved recently in 3D human pose estimation, most of the previous methods only treat a single-person case.