TesseTrack: End-to-End Learnable Multi-Person Articulated 3D Pose Tracking

We consider the task of 3D pose estimation and tracking of multiple people seen in an arbitrary number of camera feeds. We propose TesseTrack, a novel top-down approach that simultaneously reasons about multiple individualsโ€™ 3D body joint reconstructions and associations in space and time in a single end-to-end learnable framework. At the core of our approach is a novel spatio-temporal formulation that operates in a common voxelized feature space aggregated from single- or multiple camera views. After a person detection step, a 4D CNN produces short-term person-specific representations which are then linked across time by a differentiable matcher. The linked descriptions are then merged and deconvolved into 3D poses. This joint spatio-temporal formulation contrasts with previous piece-wise strategies that treat 2D pose estimation, 2D-to-3D lifting, and 3D pose tracking as independent sub-problems that are error-prone when solved in isolation. Furthermore, unlike previous methods, TesseTrack is robust to changes in the number of camera views and achieves very good results even if a single view is available at inference time. Quantitative evaluation of 3D pose reconstruction accuracy on standard benchmarks shows significant improvements over the state of the art. Evaluation of multi-person articulated 3D pose tracking in our novel evaluation framework demonstrates the superiority of TesseTrack over strong baselines.

PDF Abstract CVPR 2021 PDF CVPR 2021 Abstract

Results from the Paper


 Ranked #1 on 3D Human Pose Estimation on Panoptic (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
3D Multi-Person Pose Estimation Campus TesseTrack PCP3D 97.4 # 1
3D Pose Estimation Human3.6M TesseTrack Average MPJPE (mm) 18.7 # 1
3D Human Pose Estimation Human3.6M TesseTrack (Monocular) Average MPJPE (mm) 44.6 # 103
Using 2D ground-truth joints No # 2
Multi-View or Monocular Monocular # 1
3D Human Pose Estimation Human3.6M TesseTrack (Multi-View) Average MPJPE (mm) 18.7 # 5
Using 2D ground-truth joints No # 2
Multi-View or Monocular Multi-View # 1
3D Human Pose Tracking Panoptic TesseTrack 3DMOTA 94.1 # 1
3D Human Pose Estimation Panoptic TesseTrack Multi-View (5 views) Average MPJPE (mm) 7.3 # 1
3D Multi-Person Pose Estimation Panoptic TesseTrack Average MPJPE (mm) 7.3 # 1
3D Human Pose Estimation Panoptic TesseTrack Monocular Average MPJPE (mm) 18.9 # 5
3D Multi-Person Pose Estimation Shelf TesseTrack (paper) PCP3D 98.2 # 1
3D Multi-Person Pose Estimation Shelf TesseTrack (correct) PCP3D 97.9 # 4

Methods


No methods listed for this paper. Add relevant methods here