Exploiting temporal context for 3D human pose estimation in the wild

CVPR 2019  ยท  Anurag Arnab, Carl Doersch, Andrew Zisserman ยท

We present a bundle-adjustment-based algorithm for recovering accurate 3D human pose and meshes from monocular videos. Unlike previous algorithms which operate on single frames, we show that reconstructing a person over an entire sequence gives extra constraints that can resolve ambiguities. This is because videos often give multiple views of a person, yet the overall body shape does not change and 3D positions vary slowly. Our method improves not only on standard mocap-based datasets like Human 3.6M -- where we show quantitative improvements -- but also on challenging in-the-wild datasets such as Kinetics. Building upon our algorithm, we present a new dataset of more than 3 million frames of YouTube videos from Kinetics with automatically generated 3D poses and meshes. We show that retraining a single-frame 3D pose estimator on this data improves accuracy on both real-world and mocap data by evaluating on the 3DPW and HumanEVA datasets.

PDF Abstract CVPR 2019 PDF CVPR 2019 Abstract

Results from the Paper


 Ranked #1 on Monocular 3D Human Pose Estimation on Human3.6M (Use Video Sequence metric)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
3D Human Pose Estimation 3DPW Bundle Adjustment PA-MPJPE 72.2 # 116
Monocular 3D Human Pose Estimation Human3.6M Bundle Adjustment Use Video Sequence Yes # 1
Frames Needed 190 # 32
Need Ground Truth 2D Pose No # 1
3D Human Pose Estimation Human3.6M Bundle Adjustment (GTi) Average MPJPE (mm) 63.3 # 261
Monocular 3D Human Pose Estimation Human3.6M Bundle Adjustment (GTi) Average MPJPE (mm) 63.3 # 30
3D Human Pose Estimation Human3.6M Bundle Adjustment Average MPJPE (mm) 77.8 # 290
PA-MPJPE 41.6 # 65

Methods


No methods listed for this paper. Add relevant methods here