TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
3D Human Pose Estimation	Human3.6M	PoseFormerV2 (f=27, T=243)	Average MPJPE (mm)	45.2	# 111
3D Human Pose Estimation	MPI-INF-3DHP	PoseFormerV2 (T=81)	AUC	78.8	# 8
3D Human Pose Estimation	MPI-INF-3DHP	PoseFormerV2 (T=81)	MPJPE	27.8	# 8
3D Human Pose Estimation	MPI-INF-3DHP	PoseFormerV2 (T=81)	PCK	97.9	# 11

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/poseformerv2-exploring-frequency-domain-for/3d-human-pose-estimation-on-mpi-inf-3dhp)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-mpi-inf-3dhp?p=poseformerv2-exploring-frequency-domain-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/poseformerv2-exploring-frequency-domain-for/3d-human-pose-estimation-on-human36m)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-human36m?p=poseformerv2-exploring-frequency-domain-for)`

PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation

CVPR 2023 · Qitao Zhao, Ce Zheng, Mengyuan Liu, Pichao Wang, Chen Chen ·

Recently, transformer-based methods have gained significant success in sequential 2D-to-3D lifting human pose estimation. As a pioneering work, PoseFormer captures spatial relations of human joints in each video frame and human dynamics across frames with cascaded transformer layers and has achieved impressive performance. However, in real scenarios, the performance of PoseFormer and its follow-ups is limited by two factors: (a) The length of the input joint sequence; (b) The quality of 2D joint detection. Existing methods typically apply self-attention to all frames of the input sequence, causing a huge computational burden when the frame number is increased to obtain advanced estimation accuracy, and they are not robust to noise naturally brought by the limited capability of 2D joint detectors. In this paper, we propose PoseFormerV2, which exploits a compact representation of lengthy skeleton sequences in the frequency domain to efficiently scale up the receptive field and boost robustness to noisy 2D joint detection. With minimum modifications to PoseFormer, the proposed method effectively fuses features both in the time domain and frequency domain, enjoying a better speed-accuracy trade-off than its precursor. Extensive experiments on two benchmark datasets (i.e., Human3.6M and MPI-INF-3DHP) demonstrate that the proposed approach significantly outperforms the original PoseFormer and other transformer-based variants. Code is released at \url{https://github.com/QitaoZhao/PoseFormerV2}.

PDF Abstract CVPR 2023 PDF CVPR 2023 Abstract

Code

Add Remove Mark official

qitaozhao/poseformerv2 official

192

zczcwh/DL-HPE

444

Tasks

Add Remove

3D Human Pose Estimation

Human Dynamics

Pose Estimation

Datasets

Human3.6M

MPI-INF-3DHP

Results from the Paper

Edit

Ranked #8 on 3D Human Pose Estimation on MPI-INF-3DHP

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
3D Human Pose Estimation	Human3.6M	PoseFormerV2 (f=27, T=243)	Average MPJPE (mm)	45.2	# 111	Compare
3D Human Pose Estimation	MPI-INF-3DHP	PoseFormerV2 (T=81)	AUC	78.8	# 8	Compare
			MPJPE	27.8	# 8	Compare
			PCK	97.9	# 11	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove