3D Human Pose Estimation
311 papers with code • 25 benchmarks • 47 datasets
3D Human Pose Estimation is a computer vision task that involves estimating the 3D positions and orientations of body joints and bones from 2D images or videos. The goal is to reconstruct the 3D pose of a person in real-time, which can be used in a variety of applications, such as virtual reality, human-computer interaction, and motion analysis.
Libraries
Use these libraries to find 3D Human Pose Estimation models and implementationsDatasets
Subtasks
Latest papers
Deep Learning for 3D Human Pose Estimation and Mesh Recovery: A Survey
To the best of our knowledge, this survey is arguably the first to comprehensively cover deep learning methods for 3D human pose estimation, including both single-person and multi-person approaches, as well as human mesh recovery, encompassing methods based on explicit models and implicit representations.
Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot
We present Multi-HMR, a strong single-shot model for multi-person 3D human mesh recovery from a single RGB image.
Lester: rotoscope animation through video object segmentation and tracking
This article introduces Lester, a novel method to automatically synthetise retro-style 2D animations from videos.
Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers
Due to the challenges in data collection, mainstream datasets of 3D human pose estimation are primarily composed of multi-view video data collected in laboratory environments, which contains rich spatial-temporal correlation information besides the image frame content.
Exploring Latent Cross-Channel Embedding for Accurate 3D Human Pose Reconstruction in a Diffusion Framework
However, there is still ample room for improvement as these methods often overlook the exploration of correlation between the 2D and 3D joint-level features.
Diffusion-based Pose Refinement and Muti-hypothesis Generation for 3D Human Pose Estimaiton
To address these two challenges, we propose a diffusion-based refinement framework called DRPose, which refines the output of deterministic models by reverse diffusion and achieves more suitable multi-hypothesis prediction for the current pose benchmark by multi-step refinement with multiple noises.
STAF: 3D Human Mesh Recovery from Video with Spatio-Temporal Alignment Fusion
This method can remarkably improve the smoothness of recovery results from video.
3D-LFM: Lifting Foundation Model
The lifting of 3D structure and camera from 2D landmarks is at the cornerstone of the entire discipline of computer vision.
WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion
We address these limitations with WHAM (World-grounded Humans with Accurate Motion), which accurately and efficiently reconstructs 3D human motion in a global coordinate system from video.
VoxelKP: A Voxel-based Network Architecture for Human Keypoint Estimation in LiDAR Data
To the best of our knowledge, \textit{VoxelKP} is the first single-staged, fully sparse network that is specifically designed for addressing the challenging task of 3D keypoint estimation from LiDAR data, achieving state-of-the-art performances.