3D human pose estimation in video with temporal convolutions and semi-supervised training

In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints. We also introduce back-projection, a simple and effective semi-supervised training method that leverages unlabeled video data... (read more)

PDF Abstract CVPR 2019 PDF CVPR 2019 Abstract

Datasets


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
3D Human Pose Estimation Human3.6M Temporal convolution + semi-supervision Average MPJPE (mm) 46.8 # 20
Using 2D ground-truth joints No # 1
Multi-View or Monocular Monocular # 1
Monocular 3D Human Pose Estimation Human3.6M Temporal convolution + semi-supervision Average MPJPE (mm) 46.8 # 4
Use Video Sequence Yes # 1
Frames Needed 243 # 24
Need Ground Truth 2D Pose No # 1
Weakly-supervised 3D Human Pose Estimation Human3.6M Pavllo et al. Average MPJPE (mm) 64.7 # 2
Number of Views 1 # 1
Number of Frames Per View 243 # 5
3D Annotations S1 # 1

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet