Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows

3D human pose estimation from monocular images is a highly ill-posed problem due to depth ambiguities and occlusions. Nonetheless, most existing works ignore these ambiguities and only estimate a single solution. In contrast, we generate a diverse set of hypotheses that represents the full posterior distribution of feasible 3D poses. To this end, we propose a normalizing flow based method that exploits the deterministic 3D-to-2D mapping to solve the ambiguous inverse 2D-to-3D problem. Additionally, uncertain detections and occlusions are effectively modeled by incorporating uncertainty information of the 2D detector as condition. Further keys to success are a learned 3D pose prior and a generalization of the best-of-M loss. We evaluate our approach on the two benchmark datasets Human3.6M and MPI-INF-3DHP, outperforming all comparable methods in most metrics. The implementation is available on GitHub.

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
3D Human Pose Estimation Human3.6M Probabilistic Monocular (T=200) Average MPJPE (mm) 44.3 # 99
Using 2D ground-truth joints No # 2
Multi-View or Monocular Monocular # 1
3D Human Pose Estimation Human3.6M Probabilistic Monocular (T=1) Average MPJPE (mm) 61.8 # 253
Using 2D ground-truth joints No # 2
Multi-View or Monocular Monocular # 1
3D Human Pose Estimation MPI-INF-3DHP Probabilistic Monocular PCK 84.3 # 46

Methods


No methods listed for this paper. Add relevant methods here