TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
3D Human Pose Estimation	Human3.6M	LWCDR (extra train data)	Average MPJPE (mm)	21.0	# 11
3D Human Pose Estimation	Human3.6M	LWCDR (extra train data)	Using 2D ground-truth joints	No	# 2
3D Human Pose Estimation	Human3.6M	LWCDR (extra train data)	Multi-View or Monocular	Multi-View	# 1
3D Human Pose Estimation	Human3.6M	LWCDR	Average MPJPE (mm)	30.2	# 33
3D Human Pose Estimation	Human3.6M	LWCDR	Using 2D ground-truth joints	No	# 2
3D Human Pose Estimation	Human3.6M	LWCDR	Multi-View or Monocular	Multi-View	# 1
3D Human Pose Estimation	Total Capture	LWCDR	Average MPJPE (mm)	27.5	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lightweight-multi-view-3d-pose-estimation/3d-human-pose-estimation-on-total-capture)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-total-capture?p=lightweight-multi-view-3d-pose-estimation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lightweight-multi-view-3d-pose-estimation/3d-human-pose-estimation-on-human36m)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-human36m?p=lightweight-multi-view-3d-pose-estimation)`

Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled Representation

CVPR 2020 · Edoardo Remelli, Shangchen Han, Sina Honari, Pascal Fua, Robert Wang ·

We present a lightweight solution to recover 3D pose from multi-view images captured with spatially calibrated cameras. Building upon recent advances in interpretable representation learning, we exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points. This allows us to reason effectively about 3D pose across different views without using compute-intensive volumetric grids. Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections, that can be simply lifted to 3D via a differentiable Direct Linear Transform (DLT) layer. In order to do it efficiently, we propose a novel implementation of DLT that is orders of magnitude faster on GPU architectures than standard SVD-based triangulation methods. We evaluate our approach on two large-scale human pose datasets (H36M and Total Capture): our method outperforms or performs comparably to the state-of-the-art volumetric methods, while, unlike them, yielding real-time performance.

PDF Abstract CVPR 2020 PDF CVPR 2020 Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

3D Human Pose Estimation

3D Pose Estimation

Pose Estimation

Representation Learning

Datasets

Human3.6M

MPII

TotalCapture

Results from the Paper

Add Remove

Ranked #4 on 3D Human Pose Estimation on Total Capture

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
3D Human Pose Estimation	Human3.6M	LWCDR (extra train data)	Average MPJPE (mm)	21.0	# 11	Compare
			Using 2D ground-truth joints	No	# 2	Compare
			Multi-View or Monocular	Multi-View	# 1	Compare
3D Human Pose Estimation	Human3.6M	LWCDR	Average MPJPE (mm)	30.2	# 33	Compare
			Using 2D ground-truth joints	No	# 2	Compare
			Multi-View or Monocular	Multi-View	# 1	Compare
3D Human Pose Estimation	Total Capture	LWCDR	Average MPJPE (mm)	27.5	# 4	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled Representation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove