Prompting for Multi-Modal Tracking

29 Jul 2022  ·  Jinyu Yang, Zhe Li, Feng Zheng, Aleš Leonardis, Jingkuan Song ·

Multi-modal tracking gains attention due to its ability to be more accurate and robust in complex scenarios compared to traditional RGB-based tracking. Its key lies in how to fuse multi-modal data and reduce the gap between modalities. However, multi-modal tracking still severely suffers from data deficiency, thus resulting in the insufficient learning of fusion modules. Instead of building such a fusion module, in this paper, we provide a new perspective on multi-modal tracking by attaching importance to the multi-modal visual prompts. We design a novel multi-modal prompt tracker (ProTrack), which can transfer the multi-modal inputs to a single modality by the prompt paradigm. By best employing the tracking ability of pre-trained RGB trackers learning at scale, our ProTrack can achieve high-performance multi-modal tracking by only altering the inputs, even without any extra training on multi-modal data. Extensive experiments on 5 benchmark datasets demonstrate the effectiveness of the proposed ProTrack.

PDF Abstract
No code implementations yet. Submit your code now

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Rgb-T Tracking LasHeR ProTrack Precision 50.9 # 12
Success 42.1 # 12
Rgb-T Tracking RGBT234 ProTrack Precision 78.6 # 19
Success 58.7 # 13

Methods


No methods listed for this paper. Add relevant methods here