TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
3D Human Pose Estimation	3DPW	GLoT	PA-MPJPE	50.6	# 56
3D Human Pose Estimation	3DPW	GLoT	MPJPE	80.7	# 59
3D Human Pose Estimation	3DPW	GLoT	MPVPE	96.3	# 46
3D Human Pose Estimation	3DPW	GLoT	Acceleration Error	6.6	# 1
3D Human Pose Estimation	Human3.6M	GLoT	Average MPJPE (mm)	67	# 269
3D Human Pose Estimation	Human3.6M	GLoT	PA-MPJPE	46.3	# 82
3D Human Pose Estimation	Human3.6M	GLoT	Acceleration Error	3.6	# 3
3D Human Pose Estimation	MPI-INF-3DHP	GLoT	MPJPE	93.9	# 51
3D Human Pose Estimation	MPI-INF-3DHP	GLoT	PA-MPJPE	61.5	# 6
3D Human Pose Estimation	MPI-INF-3DHP	GLoT	Acceleration Error	7.9	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/global-to-local-modeling-for-video-based-3d/3d-human-pose-estimation-on-3dpw)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-3dpw?p=global-to-local-modeling-for-video-based-3d)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/global-to-local-modeling-for-video-based-3d/3d-human-pose-estimation-on-mpi-inf-3dhp)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-mpi-inf-3dhp?p=global-to-local-modeling-for-video-based-3d)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/global-to-local-modeling-for-video-based-3d/3d-human-pose-estimation-on-human36m)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-human36m?p=global-to-local-modeling-for-video-based-3d)`

Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation

CVPR 2023 · Xiaolong Shen, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang ·

Video-based 3D human pose and shape estimations are evaluated by intra-frame accuracy and inter-frame smoothness. Although these two metrics are responsible for different ranges of temporal consistency, existing state-of-the-art methods treat them as a unified problem and use monotonous modeling structures (e.g., RNN or attention-based block) to design their networks. However, using a single kind of modeling structure is difficult to balance the learning of short-term and long-term temporal correlations, and may bias the network to one of them, leading to undesirable predictions like global location shift, temporal inconsistency, and insufficient local details. To solve these problems, we propose to structurally decouple the modeling of long-term and short-term correlations in an end-to-end framework, Global-to-Local Transformer (GLoT). First, a global transformer is introduced with a Masked Pose and Shape Estimation strategy for long-term modeling. The strategy stimulates the global transformer to learn more inter-frame correlations by randomly masking the features of several frames. Second, a local transformer is responsible for exploiting local details on the human mesh and interacting with the global transformer by leveraging cross-attention. Moreover, a Hierarchical Spatial Correlation Regressor is further introduced to refine intra-frame estimations by decoupled global-local representation and implicit kinematic constraints. Our GLoT surpasses previous state-of-the-art methods with the lowest model parameters on popular benchmarks, i.e., 3DPW, MPI-INF-3DHP, and Human3.6M. Codes are available at https://github.com/sxl142/GLoT.

PDF Abstract CVPR 2023 PDF CVPR 2023 Abstract

Code

Add Remove Mark official

sxl142/glot official

Tasks

Add Remove

3D human pose and shape estimation

3D Human Pose Estimation

Datasets

Human3.6M

3DPW

MPI-INF-3DHP

Results from the Paper

Edit

Ranked #46 on 3D Human Pose Estimation on 3DPW

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
3D Human Pose Estimation	3DPW	GLoT	PA-MPJPE	50.6	# 56	Compare
			MPJPE	80.7	# 59	Compare
			MPVPE	96.3	# 46	Compare
			Acceleration Error	6.6	# 1	Compare
3D Human Pose Estimation	Human3.6M	GLoT	Average MPJPE (mm)	67	# 269	Compare
			PA-MPJPE	46.3	# 82	Compare
			Acceleration Error	3.6	# 3	Compare
3D Human Pose Estimation	MPI-INF-3DHP	GLoT	MPJPE	93.9	# 51	Compare
			PA-MPJPE	61.5	# 6	Compare
			Acceleration Error	7.9	# 1	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove