Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View Geometry

Epipolar constraints are at the core of feature matching and depth estimation in current multi-person multi-camera 3D human pose estimation methods. Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances mainly due to two sources of ambiguity. The first is the mismatch of human joints resulting from the simple cues provided by the Euclidean distances between joints and epipolar lines. The second is the lack of robustness from the naive formulation of the problem as a least squares minimization. In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation. Our method consists of two key components: a graph model for fast cross-view matching, and a maximum a posteriori (MAP) estimator for the reconstruction of the 3D human poses. We demonstrate the effectiveness and superiority of our proposed method on four benchmark datasets.

PDF Abstract ECCV 2020 PDF ECCV 2020 Abstract

Results from the Paper

Ranked #12 on 3D Multi-Person Pose Estimation on Panoptic (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
3D Multi-Person Pose Estimation Panoptic MVG Average MPJPE (mm) 50 # 12


No methods listed for this paper. Add relevant methods here