Sapiens: Foundation for Human Vision Models

We present Sapiens, a family of models for four fundamental human-centric vision tasks -- 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Our models natively support 1K high-resolution inference and are extremely easy to adapt for individual tasks by simply fine-tuning models pretrained on over 300 million in-the-wild human images. We observe that, given the same computational budget, self-supervised pretraining on a curated dataset of human images significantly boosts the performance for a diverse set of human-centric tasks. The resulting models exhibit remarkable generalization to in-the-wild data, even when labeled data is scarce or entirely synthetic. Our simple model design also brings scalability -- model performance across tasks improves as we scale the number of parameters from 0.3 to 2 billion. Sapiens consistently surpasses existing baselines across various human-centric benchmarks. We achieve significant improvements over the prior state-of-the-art on Humans-5K (pose) by 7.6 mAP, Humans-2K (part-seg) by 17.1 mIoU, Hi4D (depth) by 22.4% relative RMSE, and THuman2 (normal) by 53.5% relative angular error. Project page: https://about.meta.com/realitylabs/codecavatars/sapiens.

PDF Abstract

Results from the Paper


 Ranked #1 on Keypoint Detection on MS COCO (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
2D Human Pose Estimation COCO-WholeBody Sapiens-0.3B WB 62.0 # 9
body 66.4 # 12
foot 67.3 # 9
face 87.1 # 8
hand 58.1 # 8
2D Human Pose Estimation COCO-WholeBody Sapiens-2B WB 74.4 # 1
body 79.2 # 1
foot 84.1 # 1
face 91.2 # 1
hand 70.4 # 1
2D Human Pose Estimation COCO-WholeBody Sapiens-1B WB 72.7 # 2
body 77.4 # 2
foot 83.0 # 2
face 90.7 # 2
hand 69.2 # 2
2D Human Pose Estimation COCO-WholeBody Sapiens-0.6B WB 69.5 # 3
body 74.3 # 5
foot 79.4 # 4
face 89.5 # 3
hand 65.4 # 3
Keypoint Detection MS COCO Sapiens-2B Validation AP 82.2 # 1
Keypoint Detection MS COCO Sapiens-1B Validation AP 82.1 # 2
Keypoint Detection MS COCO Sapiens-0.6B Validation AP 81.2 # 3
Keypoint Detection MS COCO Sapiens-0.3B Validation AP 79.6 # 4

Methods