Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction

Exploring spatial-temporal dependencies from observed motions is one of the core challenges of human motion prediction. Previous methods mainly focus on dedicated network structures to model the spatial and temporal dependencies. This paper considers a new direction by introducing a model learning framework with auxiliary tasks. In our auxiliary tasks, partial body joints' coordinates are corrupted by either masking or adding noise and the goal is to recover corrupted coordinates depending on the rest coordinates. To work with auxiliary tasks, we propose a novel auxiliary-adapted transformer, which can handle incomplete, corrupted motion data and achieve coordinate recovery via capturing spatial-temporal dependencies. Through auxiliary tasks, the auxiliary-adapted transformer is promoted to capture more comprehensive spatial-temporal dependencies among body joints' coordinates, leading to better feature learning. Extensive experimental results have shown that our method outperforms state-of-the-art methods by remarkable margins of 7.2%, 3.7%, and 9.4% in terms of 3D mean per joint position error (MPJPE) on the Human3.6M, CMU Mocap, and 3DPW datasets, respectively. We also demonstrate that our method is more robust under data missing cases and noisy data cases. Code is available at https://github.com/MediaBrain-SJTU/AuxFormer.

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Human Pose Forecasting 3DPW AuxFormer Average MPJPE (mm) 1000 msec 107.45 # 6
Human Pose Forecasting Human3.6M AuxFormer Average MPJPE (mm) @ 1000 ms 107 # 5
Average MPJPE (mm) @ 400ms 54.1 # 4

Methods