On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

29 Mar 2022  ·  Soumava Kumar Roy, Leonardo Citraro, Sina Honari, Pascal Fua ·

Supervised approaches to 3D pose estimation from single images are remarkably effective when labeled data is abundant. However, as the acquisition of ground-truth 3D labels is labor intensive and time consuming, recent attention has shifted towards semi- and weakly-supervised learning. Generating an effective form of supervision with little annotations still poses major challenge in crowded scenes. In this paper we propose to impose multi-view geometrical constraints by means of a weighted differentiable triangulation and use it as a form of self-supervision when no labels are available. We therefore train a 2D pose estimator in such a way that its predictions correspond to the re-projection of the triangulated 3D pose and train an auxiliary network on them to produce the final 3D poses. We complement the triangulation with a weighting mechanism that alleviates the impact of noisy predictions caused by self-occlusion or occlusion from other subjects. We demonstrate the effectiveness of our semi-supervised approach on Human3.6M and MPI-INF-3DHP datasets, as well as on a new multi-view multi-person dataset that features occlusion.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Weakly-supervised 3D Human Pose Estimation Human3.6M Triangulation Average MPJPE (mm) 64.7 # 15
PA-MPJPE 52.1 # 4
Weakly-supervised 3D Human Pose Estimation MPI-INF-3DHP Triangulation MPJPE 118.4 # 2
PCK 73.4 # 3

Methods


No methods listed for this paper. Add relevant methods here