TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Monocular 3D Human Pose Estimation	Human3.6M	P-STMO (N=243)	Average MPJPE (mm)	42.1	# 10
Monocular 3D Human Pose Estimation	Human3.6M	P-STMO (N=243)	Use Video Sequence	Yes	# 1
Monocular 3D Human Pose Estimation	Human3.6M	P-STMO (N=243)	Frames Needed	243	# 33
Monocular 3D Human Pose Estimation	Human3.6M	P-STMO (N=243)	Need Ground Truth 2D Pose	No	# 1
Monocular 3D Human Pose Estimation	Human3.6M	P-STMO (N=243)	2D detector	CPN	# 1
3D Human Pose Estimation	Human3.6M	P-STMO (N=243 GT)	Average MPJPE (mm)	29.3	# 28
3D Human Pose Estimation	Human3.6M	P-STMO (N=243 GT)	Using 2D ground-truth joints	Yes	# 2
3D Human Pose Estimation	Human3.6M	P-STMO (N=243 GT)	Multi-View or Monocular	Monocular	# 1
3D Human Pose Estimation	Human3.6M	P-STMO-S (N=81)	Average MPJPE (mm)	44.1	# 97
3D Human Pose Estimation	Human3.6M	P-STMO-S (N=81)	Using 2D ground-truth joints	No	# 2
3D Human Pose Estimation	Human3.6M	P-STMO-S (N=81)	Multi-View or Monocular	Monocular	# 1
3D Human Pose Estimation	Human3.6M	P-STMO (N=243)	Average MPJPE (mm)	42.1	# 79
3D Human Pose Estimation	Human3.6M	P-STMO (N=243)	Using 2D ground-truth joints	No	# 2
3D Human Pose Estimation	Human3.6M	P-STMO (N=243)	Multi-View or Monocular	Monocular	# 1
3D Human Pose Estimation	Human3.6M	P-STMO (N=243)	PA-MPJPE	34.4	# 21
3D Human Pose Estimation	MPI-INF-3DHP	P-STMO (N=81)	AUC	75.8	# 13
3D Human Pose Estimation	MPI-INF-3DHP	P-STMO (N=81)	MPJPE	32.2	# 13
3D Human Pose Estimation	MPI-INF-3DHP	P-STMO (N=81)	PCK	97.9	# 11

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/p-stmo-pre-trained-spatial-temporal-many-to/monocular-3d-human-pose-estimation-on-human3)](https://paperswithcode.com/sota/monocular-3d-human-pose-estimation-on-human3?p=p-stmo-pre-trained-spatial-temporal-many-to)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/p-stmo-pre-trained-spatial-temporal-many-to/3d-human-pose-estimation-on-mpi-inf-3dhp)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-mpi-inf-3dhp?p=p-stmo-pre-trained-spatial-temporal-many-to)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/p-stmo-pre-trained-spatial-temporal-many-to/3d-human-pose-estimation-on-human36m)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-human36m?p=p-stmo-pre-trained-spatial-temporal-many-to)`

P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation

15 Mar 2022 · Wenkang Shan, Zhenhua Liu, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Wen Gao ·

This paper introduces a novel Pre-trained Spatial Temporal Many-to-One (P-STMO) model for 2D-to-3D human pose estimation task. To reduce the difficulty of capturing spatial and temporal information, we divide this task into two stages: pre-training (Stage I) and fine-tuning (Stage II). In Stage I, a self-supervised pre-training sub-task, termed masked pose modeling, is proposed. The human joints in the input sequence are randomly masked in both spatial and temporal domains. A general form of denoising auto-encoder is exploited to recover the original 2D poses and the encoder is capable of capturing spatial and temporal dependencies in this way. In Stage II, the pre-trained encoder is loaded to STMO model and fine-tuned. The encoder is followed by a many-to-one frame aggregator to predict the 3D pose in the current frame. Especially, an MLP block is utilized as the spatial feature extractor in STMO, which yields better performance than other methods. In addition, a temporal downsampling strategy is proposed to diminish data redundancy. Extensive experiments on two benchmarks show that our method outperforms state-of-the-art methods with fewer parameters and less computational overhead. For example, our P-STMO model achieves 42.1mm MPJPE on Human3.6M dataset when using 2D poses from CPN as inputs. Meanwhile, it brings a 1.5-7.1 times speedup to state-of-the-art methods. Code is available at https://github.com/paTRICK-swk/P-STMO.

PDF Abstract

Code

Add Remove Mark official

patrick-swk/p-stmo official

138

Tasks

Add Remove

3D Human Pose Estimation

Denoising

Monocular 3D Human Pose Estimation

Pose Estimation

Datasets

Human3.6M

MPI-INF-3DHP

Results from the Paper

Edit

Ranked #10 on Monocular 3D Human Pose Estimation on Human3.6M

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Monocular 3D Human Pose Estimation	Human3.6M	P-STMO (N=243)	Average MPJPE (mm)	42.1	# 10	Compare
			Use Video Sequence	Yes	# 1	Compare
			Frames Needed	243	# 33	Compare
			Need Ground Truth 2D Pose	No	# 1	Compare
			2D detector	CPN	# 1	Compare
3D Human Pose Estimation	Human3.6M	P-STMO-S (N=81)	Average MPJPE (mm)	44.1	# 97	Compare
			Using 2D ground-truth joints	No	# 2	Compare
			Multi-View or Monocular	Monocular	# 1	Compare
3D Human Pose Estimation	Human3.6M	P-STMO (N=243)	Average MPJPE (mm)	42.1	# 79	Compare
			Using 2D ground-truth joints	No	# 2	Compare
			Multi-View or Monocular	Monocular	# 1	Compare
			PA-MPJPE	34.4	# 21	Compare
3D Human Pose Estimation	MPI-INF-3DHP	P-STMO (N=81)	AUC	75.8	# 13	Compare
			MPJPE	32.2	# 13	Compare
			PCK	97.9	# 11	Compare

Results from Other Papers

Task	Dataset	Model	Metric Name	Metric Value	Rank	Compare
3D Human Pose Estimation	Human3.6M	P-STMO (N=243 GT)	Average MPJPE (mm)	29.3	# 28	See all
			Using 2D ground-truth joints	Yes	# 2	See all
			Multi-View or Monocular	Monocular	# 1	See all

Methods

Add Remove

Convolution • CPN • Non Maximum Suppression

Edit Social Preview

P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit