HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction

CVPR 2022  ·  Zikang Zhou, Luyao Ye, JianPing Wang, Kui Wu, Kejie Lu ·

Accurately predicting the future motions of surrounding traffic agents is critical for the safety of autonomous vehicles. Recently, vectorized approaches have dominated the motion prediction community due to their capability of capturing complex interactions in traffic scenes. However, existing methods neglect the symmetries of the problem and suffer from the expensive computational cost, facing the challenge of making real-time multi-agent motion prediction without sacrificing the prediction performance. To tackle this challenge, we propose Hierarchical Vector Transformer (HiVT) for fast and accurate multi-agent motion prediction. By decomposing the problem into local context extraction and global interaction modeling, our method can effectively and efficiently model a large number of agents in the scene. Meanwhile, we propose a translation-invariant scene representation and rotation-invariant spatial learning modules, which extract features robust to the geometric transformations of the scene and enable the model to make accurate predictions for multiple agents in a single forward pass. Experiments show that HiVT achieves the state-of-the-art performance on the Argoverse motion forecasting benchmark with a small model size and can make fast multi-agent motion prediction.

PDF Abstract


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Motion Forecasting Argoverse CVPR 2020 HiVT++ MR (K=6) 0.1221 # 267
minADE (K=1) 1.5619 # 285
minFDE (K=1) 3.4449 # 284
MR (K=1) 0.5431 # 286
minADE (K=6) 0.7673 # 290
minFDE (K=6) 1.146 # 287
DAC (K=6) 0.9891 # 30
brier-minFDE (K=6) 1.8171 # 24
Motion Forecasting Argoverse CVPR 2020 HiVT-128 MR (K=6) 0.1267 # 259
minADE (K=1) 1.5984 # 274
minFDE (K=1) 3.5328 # 270
MR (K=1) 0.5473 # 282
minADE (K=6) 0.7735 # 286
minFDE (K=6) 1.1693 # 278
DAC (K=6) 0.9888 # 36
brier-minFDE (K=6) 1.8422 # 28