SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers

19 Apr 2024  ยท  Vandad Davoodnia, Saeed Ghorbani, Alexandre Messier, Ali Etemad ยท

We introduce SkelFormer, a novel markerless motion capture pipeline for multi-view human pose and shape estimation. Our method first uses off-the-shelf 2D keypoint estimators, pre-trained on large-scale in-the-wild data, to obtain 3D joint positions. Next, we design a regression-based inverse-kinematic skeletal transformer that maps the joint positions to pose and shape representations from heavily noisy observations. This module integrates prior knowledge about pose space and infers the full pose state at runtime. Separating the 3D keypoint detection and inverse-kinematic problems, along with the expressive representations learned by our skeletal transformer, enhance the generalization of our method to unseen noisy data. We evaluate our method on three public datasets in both in-distribution and out-of-distribution settings using three datasets, and observe strong performance with respect to prior works. Moreover, ablation experiments demonstrate the impact of each of the modules of our architecture. Finally, we study the performance of our method in dealing with noise and heavy occlusions and find considerable robustness with respect to other solutions.

PDF Abstract

Results from the Paper


 Ranked #1 on Multi-view 3D Human Pose Estimation on MPI-INF-3DHP (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
3D Human Pose Estimation Human3.6M SkelFormer (CPN) Average MPJPE (mm) 33.5 # 55
Using 2D ground-truth joints No # 2
Multi-View or Monocular Multi-View # 1
PA-MPJPE 27.8 # 8
3D Human Pose Estimation Human3.6M SkelFormer (LT) Average MPJPE (mm) 25.2 # 20
Using 2D ground-truth joints No # 2
Multi-View or Monocular Multi-View # 1
PA-MPJPE 20.6 # 3
Multi-view 3D Human Pose Estimation MPI-INF-3DHP SkelFormer (HRNet - eval only) PCK 97.5 # 2
AUC 67.4 # 1
PA-MPJPE 54.8 # 1
3D Human Pose Estimation RICH SkelFormer (HRNet - eval only) MPJPE 44.2 # 2
PA-MPJPE 35.6 # 2
MPVPE 39.9 # 1

Methods


No methods listed for this paper. Add relevant methods here