This task targets at 3D Human Pose Estimation with fewer 3D annotation.
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
We start with predicted 2D keypoints for unlabeled video, then estimate 3D poses and finally back-project to the input 2D keypoints.
The main objective is to minimize the reprojection loss of keypoints, which allow our model to be trained using images in-the-wild that only have ground truth 2D annotations.
Training accurate 3D human pose estimators requires large amount of 3D ground-truth data which is costly to collect.
We propose a unified formulation for the problem of 3D human pose estimation from a single raw RGB image that reasons jointly about 2D joint estimation and 3D pose reconstruction to improve both tasks.
End-to-end deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation, yet these models may fail for unseen poses with limited and fixed training data.
In this paper, we propose to overcome this problem by learning a geometry-aware body representation from multi-view images without annotations.
Ranked #11 on Weakly-supervised 3D Human Pose Estimation on Human3.6M
Assuming that the texture of the person does not change dramatically between frames, we can apply a novel texture consistency loss, which enforces that each point in the texture map has the same texture value across all frames.
This efficiently avoids a simple memorization of the training data and allows for a weakly supervised training.