TesseTrack: End-to-End Learnable Multi-Person Articulated 3D Pose Tracking
We consider the task of 3D pose estimation and tracking of multiple people seen in an arbitrary number of camera feeds. We propose TesseTrack, a novel top-down approach that simultaneously reasons about multiple individualsโ 3D body joint reconstructions and associations in space and time in a single end-to-end learnable framework. At the core of our approach is a novel spatio-temporal formulation that operates in a common voxelized feature space aggregated from single- or multiple camera views. After a person detection step, a 4D CNN produces short-term person-specific representations which are then linked across time by a differentiable matcher. The linked descriptions are then merged and deconvolved into 3D poses. This joint spatio-temporal formulation contrasts with previous piece-wise strategies that treat 2D pose estimation, 2D-to-3D lifting, and 3D pose tracking as independent sub-problems that are error-prone when solved in isolation. Furthermore, unlike previous methods, TesseTrack is robust to changes in the number of camera views and achieves very good results even if a single view is available at inference time. Quantitative evaluation of 3D pose reconstruction accuracy on standard benchmarks shows significant improvements over the state of the art. Evaluation of multi-person articulated 3D pose tracking in our novel evaluation framework demonstrates the superiority of TesseTrack over strong baselines.
PDF Abstract CVPR 2021 PDF CVPR 2021 AbstractDatasets
Results from the Paper
Ranked #1 on
3D Human Pose Estimation
on Panoptic
(using extra training data)
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Uses Extra Training Data |
Benchmark |
---|---|---|---|---|---|---|---|
3D Multi-Person Pose Estimation | Campus | TesseTrack | PCP3D | 97.4 | # 1 | ||
3D Pose Estimation | Human3.6M | TesseTrack | Average MPJPE (mm) | 18.7 | # 1 | ||
3D Human Pose Estimation | Human3.6M | TesseTrack (Monocular) | Average MPJPE (mm) | 44.6 | # 119 | ||
Using 2D ground-truth joints | No | # 2 | |||||
Multi-View or Monocular | Monocular | # 1 | |||||
3D Human Pose Estimation | Human3.6M | TesseTrack (Multi-View) | Average MPJPE (mm) | 18.7 | # 10 | ||
Using 2D ground-truth joints | No | # 2 | |||||
Multi-View or Monocular | Multi-View | # 1 | |||||
3D Human Pose Tracking | Panoptic | TesseTrack | 3DMOTA | 94.1 | # 1 | ||
3D Human Pose Estimation | Panoptic | TesseTrack Multi-View (5 views) | Average MPJPE (mm) | 7.3 | # 1 | ||
3D Multi-Person Pose Estimation | Panoptic | TesseTrack | Average MPJPE (mm) | 7.3 | # 1 | ||
3D Human Pose Estimation | Panoptic | TesseTrack Monocular | Average MPJPE (mm) | 18.9 | # 5 | ||
3D Multi-Person Pose Estimation | Shelf | TesseTrack (paper) | PCP3D | 98.2 | # 1 | ||
3D Multi-Person Pose Estimation | Shelf | TesseTrack (correct) | PCP3D | 97.9 | # 4 |