TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
3D Human Pose Estimation	Human3.6M	MHFormer (GT)	Average MPJPE (mm)	30.5	# 35
3D Human Pose Estimation	Human3.6M	MHFormer (GT)	Using 2D ground-truth joints	Yes	# 2
3D Human Pose Estimation	Human3.6M	MHFormer (GT)	Multi-View or Monocular	Monocular	# 1
3D Human Pose Estimation	Human3.6M	MHFormer	Average MPJPE (mm)	43	# 87
3D Human Pose Estimation	Human3.6M	MHFormer	Using 2D ground-truth joints	No	# 2
3D Human Pose Estimation	Human3.6M	MHFormer	Multi-View or Monocular	Monocular	# 1
3D Human Pose Estimation	MPI-INF-3DHP	MHFormer	AUC	63.3	# 21
3D Human Pose Estimation	MPI-INF-3DHP	MHFormer	MPJPE	58	# 22
3D Human Pose Estimation	MPI-INF-3DHP	MHFormer	PCK	93.8	# 20

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mhformer-multi-hypothesis-transformer-for-3d/3d-human-pose-estimation-on-mpi-inf-3dhp)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-mpi-inf-3dhp?p=mhformer-multi-hypothesis-transformer-for-3d)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mhformer-multi-hypothesis-transformer-for-3d/3d-human-pose-estimation-on-human36m)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-human36m?p=mhformer-multi-hypothesis-transformer-for-3d)`

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

CVPR 2022 · Wenhao Li, Hong Liu, Hao Tang, Pichao Wang, Luc van Gool ·

Estimating 3D human poses from monocular videos is a challenging task due to depth ambiguity and self-occlusion. Most existing works attempt to solve both issues by exploiting spatial and temporal relationships. However, those works ignore the fact that it is an inverse problem where multiple feasible solutions (i.e., hypotheses) exist. To relieve this limitation, we propose a Multi-Hypothesis Transformer (MHFormer) that learns spatio-temporal representations of multiple plausible pose hypotheses. In order to effectively model multi-hypothesis dependencies and build strong relationships across hypothesis features, the task is decomposed into three stages: (i) Generate multiple initial hypothesis representations; (ii) Model self-hypothesis communication, merge multiple hypotheses into a single converged representation and then partition it into several diverged hypotheses; (iii) Learn cross-hypothesis communication and aggregate the multi-hypothesis features to synthesize the final 3D pose. Through the above processes, the final representation is enhanced and the synthesized pose is much more accurate. Extensive experiments show that MHFormer achieves state-of-the-art results on two challenging datasets: Human3.6M and MPI-INF-3DHP. Without bells and whistles, its performance surpasses the previous best result by a large margin of 3% on Human3.6M. Code and models are available at \url{https://github.com/Vegetebird/MHFormer}.

PDF Abstract CVPR 2022 PDF CVPR 2022 Abstract

Code

Add Remove Mark official

Vegetebird/MHFormer official

494

Tasks

Add Remove

3D Human Pose Estimation

Pose Estimation

Datasets

Human3.6M

MPI-INF-3DHP

Results from the Paper

Edit

Ranked #22 on 3D Human Pose Estimation on MPI-INF-3DHP

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
3D Human Pose Estimation	Human3.6M	MHFormer (GT)	Average MPJPE (mm)	30.5	# 35	Compare
			Using 2D ground-truth joints	Yes	# 2	Compare
			Multi-View or Monocular	Monocular	# 1	Compare
3D Human Pose Estimation	Human3.6M	MHFormer	Average MPJPE (mm)	43	# 87	Compare
			Using 2D ground-truth joints	No	# 2	Compare
			Multi-View or Monocular	Monocular	# 1	Compare
3D Human Pose Estimation	MPI-INF-3DHP	MHFormer	AUC	63.3	# 21	Compare
			MPJPE	58	# 22	Compare
			PCK	93.8	# 20	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove