STMTrack: Template-free Visual Tracking with Space-time Memory Networks

CVPR 2021  ·  Zhihong Fu, Qingjie Liu, Zehua Fu, Yunhong Wang ·

Boosting performance of the offline trained siamese trackers is getting harder nowadays since the fixed information of the template cropped from the first frame has been almost thoroughly mined, but they are poorly capable of resisting target appearance changes. Existing trackers with template updating mechanisms rely on time-consuming numerical optimization and complex hand-designed strategies to achieve competitive performance, hindering them from real-time tracking and practical applications. In this paper, we propose a novel tracking framework built on top of a space-time memory network that is competent to make full use of historical information related to the target for better adapting to appearance variations during tracking. Specifically, a novel memory mechanism is introduced, which stores the historical information of the target to guide the tracker to focus on the most informative regions in the current frame. Furthermore, the pixel-level similarity computation of the memory network enables our tracker to generate much more accurate bounding boxes of the target. Extensive experiments and comparisons with many competitive trackers on challenging large-scale benchmarks, OTB-2015, TrackingNet, GOT-10k, LaSOT, UAV123, and VOT2018, show that, without bells and whistles, our tracker outperforms all previous state-of-the-art real-time methods while running at 37 FPS. The code is available at

PDF Abstract CVPR 2021 PDF CVPR 2021 Abstract

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Visual Object Tracking GOT-10k STMTrack Average Overlap 64.2 # 24
Success Rate 0.5 73.7 # 20
Success Rate 0.75 57.5 # 16
Visual Object Tracking OTB-2015 STMTrack AUC 0.719 # 3