Learning 3D Human Pose from Structure and Motion

3D human pose estimation from a single image is a challenging problem, especially for in-the-wild settings due to the lack of 3D annotated data. We propose two anatomically inspired loss functions and use them with a weakly-supervised learning framework to jointly learn from large-scale in-the-wild 2D and indoor/synthetic 3D data. We also present a simple temporal network that exploits temporal and structural cues present in predicted pose sequences to temporally harmonize the pose estimations. We carefully analyze the proposed contributions through loss surface visualizations and sensitivity analysis to facilitate deeper understanding of their working mechanism. Our complete pipeline improves the state-of-the-art by 11.8% and 12% on Human3.6M and MPI-INF-3DHP, respectively, and runs at 30 FPS on a commodity graphics card.

PDF Abstract ECCV 2018 PDF ECCV 2018 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Monocular 3D Human Pose Estimation Human3.6M TP-Net Average MPJPE (mm) 52.1 # 24
Use Video Sequence Yes # 1
Frames Needed 20 # 26
Need Ground Truth 2D Pose No # 1
3D Human Pose Estimation Human3.6M TP-Net Average MPJPE (mm) 52.1 # 207
PA-MPJPE 36.3 # 43

Results from Other Papers


Task Dataset Model Metric Name Metric Value Rank Source Paper Compare
3D Human Pose Estimation 3DPW TP-Net PA-MPJPE 92.2 # 131

Methods


No methods listed for this paper. Add relevant methods here