SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach

Human poses that are rare or unseen in a training set are challenging for a network to predict. Similar to the long-tailed distribution problem in visual recognition, the small number of examples for such poses limits the ability of networks to model them. Interestingly, local pose distributions suffer less from the long-tail problem, i.e., local joint configurations within a rare pose may appear within other poses in the training set, making them less rare. We propose to take advantage of this fact for better generalization to rare and unseen poses. To be specific, our method splits the body into local regions and processes them in separate network branches, utilizing the property that a joint position depends mainly on the joints within its local body region. Global coherence is maintained by recombining the global context from the rest of the body into each branch as a low-dimensional vector. With the reduced dimensionality of less relevant body areas, the training set distribution within network branches more closely reflects the statistics of local poses instead of global body poses, without sacrificing information important for joint inference. The proposed split-and-recombine approach, called SRNet, can be easily adapted to both single-image and temporal models, and it leads to appreciable improvements in the prediction of rare and unseen poses.

PDF Abstract ECCV 2020 PDF ECCV 2020 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
3D Human Pose Estimation Human3.6M SRNet (T=243 GT) Average MPJPE (mm) 32 # 41
Using 2D ground-truth joints Yes # 2
Multi-View or Monocular Monocular # 1
3D Human Pose Estimation Human3.6M SRNet (T=243) Average MPJPE (mm) 44.8 # 104
Using 2D ground-truth joints No # 2
Multi-View or Monocular Monocular # 1
3D Human Pose Estimation Human3.6M SRNet (T=1 GT) Average MPJPE (mm) 33.9 # 48
Using 2D ground-truth joints Yes # 2
Multi-View or Monocular Monocular # 1
3D Human Pose Estimation Human3.6M SRNet (T=1) Average MPJPE (mm) 49.9 # 162
Using 2D ground-truth joints No # 2
Multi-View or Monocular Monocular # 1
Monocular 3D Human Pose Estimation Human3.6M SRNET Average MPJPE (mm) 49.9 # 19
Use Video Sequence No # 1
Frames Needed 1 # 1
Need Ground Truth 2D Pose No # 1
3D Human Pose Estimation MPI-INF-3DHP SRNET AUC 43.8 # 56
PCK 77.6 # 66

Methods


No methods listed for this paper. Add relevant methods here