TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
3D Human Pose Estimation	Human3.6M	FLEX (Using 2D GT)	Average MPJPE (mm)	24.8	# 18
3D Human Pose Estimation	Human3.6M	FLEX (Using 2D GT)	Using 2D ground-truth joints	Yes	# 2
3D Human Pose Estimation	Human3.6M	FLEX (Using 2D GT)	Multi-View or Monocular	Multi-View	# 1
3D Human Pose Estimation	Human3.6M	FLEX	Average MPJPE (mm)	30.9	# 37
3D Human Pose Estimation	Human3.6M	FLEX	Using 2D ground-truth joints	No	# 2
3D Human Pose Estimation	Human3.6M	FLEX	Multi-View or Monocular	Multi-View	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/flex-parameter-free-multi-view-3d-human/3d-human-pose-estimation-on-human36m)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-human36m?p=flex-parameter-free-multi-view-3d-human)`

FLEX: Extrinsic Parameters-free Multi-view 3D Human Motion Reconstruction

5 May 2021 · Brian Gordon, Sigal Raab, Guy Azov, Raja Giryes, Daniel Cohen-Or ·

The increasing availability of video recordings made by multiple cameras has offered new means for mitigating occlusion and depth ambiguities in pose and motion reconstruction methods. Yet, multi-view algorithms strongly depend on camera parameters; particularly, the relative transformations between the cameras. Such a dependency becomes a hurdle once shifting to dynamic capture in uncontrolled settings. We introduce FLEX (Free muLti-view rEconstruXion), an end-to-end extrinsic parameter-free multi-view model. FLEX is extrinsic parameter-free (dubbed ep-free) in the sense that it does not require extrinsic camera parameters. Our key idea is that the 3D angles between skeletal parts, as well as bone lengths, are invariant to the camera position. Hence, learning 3D rotations and bone lengths rather than locations allows predicting common values for all camera views. Our network takes multiple video streams, learns fused deep features through a novel multi-view fusion layer, and reconstructs a single consistent skeleton with temporally coherent joint rotations. We demonstrate quantitative and qualitative results on three public datasets, and on synthetic multi-person video streams captured by dynamic cameras. We compare our model to state-of-the-art methods that are not ep-free and show that in the absence of camera parameters, we outperform them by a large margin while obtaining comparable results when camera parameters are available. Code, trained models, and other materials are available on our project page.

PDF Abstract

Code

Add Remove Mark official

BrianG13/FLEX official

Tasks

Add Remove

3D Human Pose Estimation

Datasets

Human3.6M

KTH Multiview Football II

Results from the Paper

Edit

Ranked #18 on 3D Human Pose Estimation on Human3.6M

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
3D Human Pose Estimation	Human3.6M	FLEX (Using 2D GT)	Average MPJPE (mm)	24.8	# 18	Compare
			Using 2D ground-truth joints	Yes	# 2	Compare
			Multi-View or Monocular	Multi-View	# 1	Compare
3D Human Pose Estimation	Human3.6M	FLEX	Average MPJPE (mm)	30.9	# 37	Compare
			Using 2D ground-truth joints	No	# 2	Compare
			Multi-View or Monocular	Multi-View	# 1	Compare

Methods

Add Remove

Linear Layer • Multi-Head Attention • Scaled Dot-Product Attention • Softmax

Edit Social Preview

FLEX: Extrinsic Parameters-free Multi-view 3D Human Motion Reconstruction

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove