Learnable Triangulation of Human Pose

We present two novel solutions for multi-view 3D human pose estimation based on new learnable triangulation methods that combine 3D information from multiple 2D views. The first (baseline) solution is a basic differentiable algebraic triangulation with an addition of confidence weights estimated from the input images. The second solution is based on a novel method of volumetric aggregation from intermediate 2D backbone feature maps. The aggregated volume is then refined via 3D convolutions that produce final 3D joint heatmaps and allow modelling a human pose prior. Crucially, both approaches are end-to-end differentiable, which allows us to directly optimize the target metric. We demonstrate transferability of the solutions across datasets and considerably improve the multi-view state of the art on the Human3.6M dataset. Video demonstration, annotations and additional materials will be posted on our project page (https://saic-violet.github.io/learnable-triangulation).

PDF Abstract ICCV 2019 PDF ICCV 2019 Abstract

Results from the Paper


Ranked #3 on 3D Human Pose Estimation on Panoptic (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
3D Human Pose Estimation Human3.6M Learnable Triangulation of Human Pose (filtered) Average MPJPE (mm) 17.7 # 4
Multi-View or Monocular Multi-View # 1
3D Human Pose Estimation Human3.6M Learnable Triangulation of Human Pose (Monocular) Average MPJPE (mm) 49.9 # 162
3D Human Pose Estimation Human3.6M Learnable Triangulation of Human Pose Average MPJPE (mm) 20.8 # 9
Using 2D ground-truth joints No # 2
Multi-View or Monocular Multi-View # 1
3D Human Pose Estimation Panoptic Learnable Triangulation of Human Pose Average MPJPE (mm) 13.7 # 3

Methods


No methods listed for this paper. Add relevant methods here