Monocular, One-stage, Regression of Multiple 3D People

ICCV 2021  ยท  Yu Sun, Qian Bao, Wu Liu, Yili Fu, Michael J. Black, Tao Mei ยท

This paper focuses on the regression of multiple 3D people from a single RGB image. Existing approaches predominantly follow a multi-stage pipeline that first detects people in bounding boxes and then independently regresses their 3D body meshes. In contrast, we propose to Regress all meshes in a One-stage fashion for Multiple 3D People (termed ROMP). The approach is conceptually simple, bounding box-free, and able to learn a per-pixel representation in an end-to-end manner. Our method simultaneously predicts a Body Center heatmap and a Mesh Parameter map, which can jointly describe the 3D body mesh on the pixel level. Through a body-center-guided sampling process, the body mesh parameters of all people in the image are easily extracted from the Mesh Parameter map. Equipped with such a fine-grained representation, our one-stage framework is free of the complex multi-stage process and more robust to occlusion. Compared with state-of-the-art methods, ROMP achieves superior performance on the challenging multi-person benchmarks, including 3DPW and CMU Panoptic. Experiments on crowded/occluded datasets demonstrate the robustness under various types of occlusion. The released code is the first real-time implementation of monocular multi-person 3D mesh regression.

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract

Results from the Paper


 Ranked #1 on 3D Multi-Person Mesh Recovery on Relative Human (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
3D Human Pose Estimation 3D Poses in the Wild Challenge ROMP MPJPE 81.76 # 2
3D Human Pose Estimation 3DPW ROMP PA-MPJPE 47.3 # 47
MPJPE 76.7 # 47
MPVPE 93.4 # 43
Multi-Person Pose Estimation CrowdPose ROMP+CAR mAP @0.5:0.95 58.6 # 19
Multi-Person Pose Estimation CrowdPose ROMP mAP @0.5:0.95 55.6 # 22
3D Human Pose Estimation EMDB ROMP Average MPJPE (mm) 112.652 # 7
Average MPJPE-PA (mm) 75.1869 # 9
Average MVE (mm) 134.863 # 9
Average MVE-PA (mm) 90.648 # 7
Average MPJAE (deg) 26.5975 # 7
Average MPJAE-PA (deg) 23.9901 # 3
Jitter (10m/s^3) 71.2556 # 4
3D Human Pose Estimation Panoptic ROMP (ResNet-50) Average MPJPE (mm) 127.6 # 7
3D Multi-Person Mesh Recovery Relative Human ROMP PCDR 68.27 # 1
3D Depth Estimation Relative Human ROMP PCDR 54.84 # 2
PCDR-Baby 30.08 # 3
PCDR-Kid 48.41 # 2
PCDR-Teen 51.12 # 3
PCDR-Adult 55.34 # 3
mPCDK 0.866 # 2

Methods