On Boosting Single-Frame 3D Human Pose Estimation via Monocular Videos

ICCV 2019  ·  Zhi Li, Xuan Wang, Fei Wang, Peilin Jiang ·

The premise of training an accurate 3D human pose estimation network is the possession of huge amount of richly annotated training data. Nonetheless, manually obtaining rich and accurate annotations is, even not impossible, tedious and slow. In this paper, we propose to exploit monocular videos to complement the training dataset for the single-image 3D human pose estimation tasks. At the beginning, a baseline model is trained with a small set of annotations. By fixing some reliable estimations produced by the resulting model, our method automatically collects the annotations across the entire video as solving the 3D trajectory completion problem. Then, the baseline model is further trained with the collected annotations to learn the new poses. We evaluate our method on the broadly-adopted Human3.6M and MPI-INF-3DHP datasets. As illustrated in experiments, given only a small set of annotations, our method successfully makes the model to learn new poses from unlabelled monocular videos, promoting the accuracies of the baseline model by about 10%. By contrast with previous approaches, our method does not rely on either multi-view imagery or any explicit 2D keypoint annotations.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Weakly-supervised 3D Human Pose Estimation Human3.6M Li et al. Average MPJPE (mm) 88.8 # 23
Number of Views 1 # 1
3D Annotations S1 # 1

Methods


No methods listed for this paper. Add relevant methods here