Epipolar Transformers

A common approach to localize 3D human joints in a synchronized and calibrated multi-view setup consists of two-steps: (1) apply a 2D detector separately on each view to localize joints in 2D, and (2) perform robust triangulation on 2D detections from each view to acquire the 3D joint locations. However, in step 1, the 2D detector is limited to solving challenging cases which could potentially be better resolved in 3D, such as occlusions and oblique viewing angles, purely in 2D without leveraging any 3D information... (read more)

PDF Abstract CVPR 2020 PDF CVPR 2020 Abstract

Results from the Paper


Ranked #2 on 3D Human Pose Estimation on Human3.6M (using extra training data)

     Get a GitHub badge
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK USES EXTRA
TRAINING DATA
BENCHMARK
3D Human Pose Estimation Human3.6M Epipolar Transformer+R152 384x384 Average MPJPE (mm) 19.0 # 2
Using 2D ground-truth joints No # 1
Multi-View or Monocular Multi-View # 1
3D Human Pose Estimation Human3.6M Epipolar Transformer+R50 256×256+RPSM Average MPJPE (mm) 26.9 # 4
Using 2D ground-truth joints No # 1
Multi-View or Monocular Multi-View # 1

Methods used in the Paper


METHOD TYPE
Residual Connection
Skip Connections
BPE
Subword Segmentation
Dense Connections
Feedforward Networks
Label Smoothing
Regularization
ReLU
Activation Functions
Adam
Stochastic Optimization
Softmax
Output Functions
Dropout
Regularization
Multi-Head Attention
Attention Modules
Layer Normalization
Normalization
Scaled Dot-Product Attention
Attention Mechanisms
Transformer
Transformers