HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction

CVPR 2022  ·  Zikang Zhou, Luyao Ye, JianPing Wang, Kui Wu, Kejie Lu ·

Accurately predicting the future motions of surrounding traffic agents is critical for the safety of autonomous vehicles. Recently, vectorized approaches have dominated the motion prediction community due to their capability of capturing complex interactions in traffic scenes. However, existing methods neglect the symmetries of the problem and suffer from the expensive computational cost, facing the challenge of making real-time multi-agent motion prediction without sacrificing the prediction performance. To tackle this challenge, we propose Hierarchical Vector Transformer (HiVT) for fast and accurate multi-agent motion prediction. By decomposing the problem into local context extraction and global interaction modeling, our method can effectively and efficiently model a large number of agents in the scene. Meanwhile, we propose a translation-invariant scene representation and rotation-invariant spatial learning modules, which extract features robust to the geometric transformations of the scene and enable the model to make accurate predictions for multiple agents in a single forward pass. Experiments show that HiVT achieves the state-of-the-art performance on the Argoverse motion forecasting benchmark with a small model size and can make fast multi-agent motion prediction.

PDF Abstract


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Motion Forecasting Argoverse CVPR 2020 HiVT++ MR (K=6) 0.1221 # 224
minADE (K=1) 1.5619 # 237
minFDE (K=1) 3.4449 # 235
MR (K=1) 0.5431 # 238
minADE (K=6) 0.7673 # 242
minFDE (K=6) 1.146 # 239
DAC (K=6) 0.9891 # 19
brier-minFDE (K=6) 1.8171 # 15
Motion Forecasting Argoverse CVPR 2020 HiVT-128 MR (K=6) 0.1267 # 215
minADE (K=1) 1.5984 # 229
minFDE (K=1) 3.5328 # 223
MR (K=1) 0.5473 # 236
minADE (K=6) 0.7735 # 240
minFDE (K=6) 1.1693 # 234
DAC (K=6) 0.9888 # 25
brier-minFDE (K=6) 1.8422 # 18