3D Human Pose Estimation via Explicit Compositional Depth Maps

AAAI 2020  ·  Haiping Wu, Bin Xiao ·

n this work, we tackle the problem of estimating 3D human pose in camera space from a monocular image. First, we propose to use densely-generated limb depth maps to ease the learning of body joints depth, which are well aligned with image cues. Then, we design a lifting module from 2D pixel coordinates to 3D camera coordinates which explicitly takes the depth values as inputs, and is aligned with camera perspective projection model. We show our method achieves superior performance on large-scale 3D pose datasets Human3.6M and MPI-INF-3DHP, and sets the new state-of-the-art.

PDF

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
3D Human Pose Estimation Human3.6M Explicit Compositional Depth Maps (HRNet-W32 MPII) Average MPJPE (mm) 43.2 # 89
Using 2D ground-truth joints No # 2
Multi-View or Monocular Monocular # 1
PA-MPJPE 34.6 # 24
3D Human Pose Estimation Human3.6M Explicit Compositional Depth Maps (ResNet-50 MPII) Average MPJPE (mm) 47.3 # 132
Using 2D ground-truth joints No # 2
Multi-View or Monocular Monocular # 1
PA-MPJPE 37.3 # 35
3D Human Pose Estimation MPI-INF-3DHP Explicit Compositional Depth Maps AUC 62.4 # 23
PCK 93.2 # 21

Methods


No methods listed for this paper. Add relevant methods here