ODTrack: Online Dense Temporal Token Learning for Visual Tracking

3 Jan 2024  ·  Yaozong Zheng, Bineng Zhong, Qihua Liang, Zhiyi Mo, Shengping Zhang, Xianxian Li ·

Online contextual reasoning and association across consecutive video frames are critical to perceive instances in visual tracking. However, most current top-performing trackers persistently lean on sparse temporal relationships between reference and search frames via an offline mode. Consequently, they can only interact independently within each image-pair and establish limited temporal correlations. To alleviate the above problem, we propose a simple, flexible and effective video-level tracking pipeline, named \textbf{ODTrack}, which densely associates the contextual relationships of video frames in an online token propagation manner. ODTrack receives video frames of arbitrary length to capture the spatio-temporal trajectory relationships of an instance, and compresses the discrimination features (localization information) of a target into a token sequence to achieve frame-to-frame association. This new solution brings the following benefits: 1) the purified token sequences can serve as prompts for the inference in the next video frame, whereby past information is leveraged to guide future inference; 2) the complex online update strategies are effectively avoided by the iterative propagation of token sequences, and thus we can achieve more efficient model representation and computation. ODTrack achieves a new \textit{SOTA} performance on seven benchmarks, while running at real-time speed. Code and models are available at \url{https://github.com/GXNU-ZhongLab/ODTrack}.

PDF Abstract

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Visual Object Tracking GOT-10k ODTrack-B Average Overlap 77.0 # 4
Visual Object Tracking GOT-10k ODTrack-L Average Overlap 78.2 # 3
Visual Object Tracking LaSOT ODTrack-B AUC 73.2 # 4
Visual Object Tracking LaSOT ODTrack-L AUC 74.0 # 1
Visual Object Tracking LaSOT-ext ODTrack-L AUC 53.9 # 2
Visual Object Tracking LaSOT-ext ODTrack-B AUC 52.4 # 6
Visual Object Tracking OTB-2015 ODTrack-B AUC 0.723 # 2
Visual Object Tracking OTB-2015 ODTrack-L AUC 0.724 # 1
Visual Object Tracking TNL2K ODTrack-B AUC 60.9 # 3
Visual Object Tracking TNL2K ODTrack-L AUC 61.7 # 1
Visual Object Tracking TrackingNet ODTrack-B Accuracy 85.1 # 7
Visual Object Tracking TrackingNet ODTrack-L Accuracy 86.1 # 1
Semi-Supervised Video Object Segmentation VOT2020 ODTrack-B EAO 0.581 # 7
Semi-Supervised Video Object Segmentation VOT2020 ODTrack-L EAO 0.605 # 3


No methods listed for this paper. Add relevant methods here