TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Weakly-supervised 3D Human Pose Estimation	Human3.6M	self-supervised mocap	Average MPJPE (mm)	98.4	# 25
3D Human Reconstruction	Surreal	self-supervised mocap	MPVPE	74.5	# 2
3D Human Pose Estimation	Surreal	self-supervised mocap	MPJPE	64.4	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/self-supervised-learning-of-motion-capture/3d-human-reconstruction-on-surreal)](https://paperswithcode.com/sota/3d-human-reconstruction-on-surreal?p=self-supervised-learning-of-motion-capture)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/self-supervised-learning-of-motion-capture/3d-human-pose-estimation-on-surreal-1)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-surreal-1?p=self-supervised-learning-of-motion-capture)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/self-supervised-learning-of-motion-capture/weakly-supervised-3d-human-pose-estimation-on)](https://paperswithcode.com/sota/weakly-supervised-3d-human-pose-estimation-on?p=self-supervised-learning-of-motion-capture)`

Self-supervised Learning of Motion Capture

NeurIPS 2017 · Hsiao-Yu Fish Tung, Hsiao-Wei Tung, Ersin Yumer, Katerina Fragkiadaki ·

Current state-of-the-art solutions for motion capture from a single camera are optimization driven: they optimize the parameters of a 3D human model so that its re-projection matches measurements in the video (e.g. person segmentation, optical flow, keypoint detections etc.). Optimization models are susceptible to local minima. This has been the bottleneck that forced using clean green-screen like backgrounds at capture time, manual initialization, or switching to multiple cameras as input resource. In this work, we propose a learning based motion capture model for single camera input. Instead of optimizing mesh and skeleton parameters directly, our model optimizes neural network weights that predict 3D shape and skeleton configurations given a monocular RGB video. Our model is trained using a combination of strong supervision from synthetic data, and self-supervision from differentiable rendering of (a) skeletal keypoints, (b) dense 3D mesh motion, and (c) human-background segmentation, in an end-to-end framework. Empirically we show our model combines the best of both worlds of supervised learning and test-time optimization: supervised learning initializes the model parameters in the right regime, ensuring good pose and surface initialization at test time, without manual effort. Self-supervision by back-propagating through differentiable rendering allows (unsupervised) adaptation of the model to the test data, and offers much tighter fit than a pretrained fixed model. We show that the proposed model improves with experience and converges to low-error solutions where previous optimization methods fail.

PDF Abstract NeurIPS 2017 PDF NeurIPS 2017 Abstract

Code

Add Remove Mark official

chingswy/HumanPoseMemo

168

Tasks

Add Remove

3D Human Pose Estimation

3D Human Reconstruction

Optical Flow Estimation

Self-Supervised Learning

Weakly-supervised 3D Human Pose Estimation

Datasets

Human3.6M

SURREAL

Results from the Paper

Edit

Ranked #2 on 3D Human Reconstruction on Surreal

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Weakly-supervised 3D Human Pose Estimation	Human3.6M	self-supervised mocap	Average MPJPE (mm)	98.4	# 25	Compare
3D Human Reconstruction	Surreal	self-supervised mocap	MPVPE	74.5	# 2	Compare
3D Human Pose Estimation	Surreal	self-supervised mocap	MPJPE	64.4	# 5	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Self-supervised Learning of Motion Capture

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove