TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
3D Human Pose Estimation	Human3.6M	ConvFormer (T=243)	Average MPJPE (mm)	29.8	# 30
3D Human Pose Estimation	Human3.6M	ConvFormer (T=243)	Using 2D ground-truth joints	Yes	# 2
3D Human Pose Estimation	Human3.6M	ConvFormer (T=243)	Multi-View or Monocular	Monocular	# 1
3D Human Pose Estimation	Human3.6M	ConvFormer (T=243, CPN)	Average MPJPE (mm)	43.2	# 89
3D Human Pose Estimation	Human3.6M	ConvFormer (T=243, CPN)	Using 2D ground-truth joints	No	# 2
3D Human Pose Estimation	Human3.6M	ConvFormer (T=243, CPN)	Multi-View or Monocular	Monocular	# 1
3D Human Pose Estimation	HumanEva-I	ConvFormer (T=43)	Mean Reconstruction Error (mm)	24.3	# 16
3D Human Pose Estimation	MPI-INF-3DHP	ConvFormer	AUC	69.8	# 16
3D Human Pose Estimation	MPI-INF-3DHP	ConvFormer	MPJPE	53.6	# 18
3D Human Pose Estimation	MPI-INF-3DHP	ConvFormer	PCK	96.4	# 17

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/convformer-parameter-reduction-in-transformer/3d-human-pose-estimation-on-humaneva-i)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-humaneva-i?p=convformer-parameter-reduction-in-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/convformer-parameter-reduction-in-transformer/3d-human-pose-estimation-on-mpi-inf-3dhp)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-mpi-inf-3dhp?p=convformer-parameter-reduction-in-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/convformer-parameter-reduction-in-transformer/3d-human-pose-estimation-on-human36m)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-human36m?p=convformer-parameter-reduction-in-transformer)`

ConvFormer: Parameter Reduction in Transformer Models for 3D Human Pose Estimation by Leveraging Dynamic Multi-Headed Convolutional Attention

4 Apr 2023 · Alec Diaz-Arias, Dmitriy Shin ·

Recently, fully-transformer architectures have replaced the defacto convolutional architecture for the 3D human pose estimation task. In this paper we propose \textbf{\textit{ConvFormer}}, a novel convolutional transformer that leverages a new \textbf{\textit{dynamic multi-headed convolutional self-attention}} mechanism for monocular 3D human pose estimation. We designed a spatial and temporal convolutional transformer to comprehensively model human joint relations within individual frames and globally across the motion sequence. Moreover, we introduce a novel notion of \textbf{\textit{temporal joints profile}} for our temporal ConvFormer that fuses complete temporal information immediately for a local neighborhood of joint features. We have quantitatively and qualitatively validated our method on three common benchmark datasets: Human3.6M, MPI-INF-3DHP, and HumanEva. Extensive experiments have been conducted to identify the optimal hyper-parameter set. These experiments demonstrated that we achieved a \textbf{significant parameter reduction relative to prior transformer models} while attaining State-of-the-Art (SOTA) or near SOTA on all three datasets. Additionally, we achieved SOTA for Protocol III on H36M for both GT and CPN detection inputs. Finally, we obtained SOTA on all three metrics for the MPI-INF-3DHP dataset and for all three subjects on HumanEva under Protocol II.

PDF Abstract

Code

Add Remove Mark official

ajda1992/convformer

Tasks

Add Remove

3D Human Pose Estimation

Monocular 3D Human Pose Estimation

Pose Estimation

Datasets

Human3.6M

MPI-INF-3DHP

Results from the Paper

Edit

Ranked #16 on 3D Human Pose Estimation on HumanEva-I

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
3D Human Pose Estimation	Human3.6M	ConvFormer (T=243)	Average MPJPE (mm)	29.8	# 30	Compare
			Using 2D ground-truth joints	Yes	# 2	Compare
			Multi-View or Monocular	Monocular	# 1	Compare
3D Human Pose Estimation	Human3.6M	ConvFormer (T=243, CPN)	Average MPJPE (mm)	43.2	# 89	Compare
			Using 2D ground-truth joints	No	# 2	Compare
			Multi-View or Monocular	Monocular	# 1	Compare
3D Human Pose Estimation	HumanEva-I	ConvFormer (T=43)	Mean Reconstruction Error (mm)	24.3	# 16	Compare
3D Human Pose Estimation	MPI-INF-3DHP	ConvFormer	AUC	69.8	# 16	Compare
			MPJPE	53.6	# 18	Compare
			PCK	96.4	# 17	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Convolution • CPN • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

ConvFormer: Parameter Reduction in Transformer Models for 3D Human Pose Estimation by Leveraging Dynamic Multi-Headed Convolutional Attention

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove