Human Pose Estimation from Ambiguous Pressure Recordings with Spatio-temporal Masked Transformers

10 Mar 2023 · Vandad Davoodnia, Ali Etemad ·

Despite the impressive performance of vision-based pose estimators, they generally fail to perform well under adverse vision conditions and often don't satisfy the privacy demands of customers. As a result, researchers have begun to study tactile sensing systems as an alternative. However, these systems suffer from noisy and ambiguous recordings. To tackle this problem, we propose a novel solution for pose estimation from ambiguous pressure data. Our method comprises a spatio-temporal vision transformer with an encoder-decoder architecture. Detailed experiments on two popular public datasets reveal that our model outperforms existing solutions in the area. Moreover, we observe that increasing the number of temporal crops in the early stages of the network positively impacts the performance while pre-training the network in a self-supervised setting using a masked auto-encoder approach also further improves the results.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Pose Estimation

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Edit

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

Dense Connections • fail • Layer Normalization • Linear Layer • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Vision Transformer

Edit Social Preview

Human Pose Estimation from Ambiguous Pressure Recordings with Spatio-temporal Masked Transformers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove