TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Pose Estimation	J-HMDB	LSTM PM	Mean PCK@0.2	93.6	# 3
2D Human Pose Estimation	JHMDB (2D poses only)	LSTM PM	PCK	93.6	# 4
Pose Estimation	UPenn Action	LSTM PM	Mean PCK@0.2	97.7	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lstm-pose-machines/pose-estimation-on-j-hmdb)](https://paperswithcode.com/sota/pose-estimation-on-j-hmdb?p=lstm-pose-machines)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lstm-pose-machines/pose-estimation-on-upenn-action)](https://paperswithcode.com/sota/pose-estimation-on-upenn-action?p=lstm-pose-machines)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lstm-pose-machines/2d-human-pose-estimation-on-jhmdb-2d-poses)](https://paperswithcode.com/sota/2d-human-pose-estimation-on-jhmdb-2d-poses?p=lstm-pose-machines)`

LSTM Pose Machines

CVPR 2018 · Yue Luo, Jimmy Ren, Zhouxia Wang, Wenxiu Sun, Jinshan Pan, Jianbo Liu, Jiahao Pang, Liang Lin ·

We observed that recent state-of-the-art results on single image human pose estimation were achieved by multi-stage Convolution Neural Networks (CNN). Notwithstanding the superior performance on static images, the application of these models on videos is not only computationally intensive, it also suffers from performance degeneration and flicking. Such suboptimal results are mainly attributed to the inability of imposing sequential geometric consistency, handling severe image quality degradation (e.g. motion blur and occlusion) as well as the inability of capturing the temporal correlation among video frames. In this paper, we proposed a novel recurrent network to tackle these problems. We showed that if we were to impose the weight sharing scheme to the multi-stage CNN, it could be re-written as a Recurrent Neural Network (RNN). This property decouples the relationship among multiple network stages and results in significantly faster speed in invoking the network for videos. It also enables the adoption of Long Short-Term Memory (LSTM) units between video frames. We found such memory augmented RNN is very effective in imposing geometric consistency among frames. It also well handles input quality degradation in videos while successfully stabilizes the sequential outputs. The experiments showed that our approach significantly outperformed current state-of-the-art methods on two large-scale video pose estimation benchmarks. We also explored the memory cells inside the LSTM and provided insights on why such mechanism would benefit the prediction for video-based pose estimations.

PDF Abstract CVPR 2018 PDF CVPR 2018 Abstract

Code

Add Remove Mark official

lawy623/LSTM_Pose_Machines official

274

Tasks

Add Remove

2D Human Pose Estimation

Pose Estimation

Datasets

JHMDB

Penn Action

Results from the Paper

Edit

Ranked #3 on Pose Estimation on J-HMDB

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Pose Estimation	J-HMDB	LSTM PM	Mean PCK@0.2	93.6	# 3	Compare
2D Human Pose Estimation	JHMDB (2D poses only)	LSTM PM	PCK	93.6	# 4	Compare
Pose Estimation	UPenn Action	LSTM PM	Mean PCK@0.2	97.7	# 3	Compare

Methods

Add Remove

Convolution • LSTM • Sigmoid Activation • SPEED • Tanh Activation

Edit Social Preview

LSTM Pose Machines

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove