Unified Sequence-to-Sequence Learning for Single- and Multi-Modal Visual Object Tracking

CVPR 2023  ยท  Xin Chen, Ben Kang, Jiawen Zhu, Dong Wang, Houwen Peng, Huchuan Lu ยท

In this paper, we introduce a new sequence-to-sequence learning framework for RGB-based and multi-modal object tracking. First, we present SeqTrack for RGB-based tracking. It casts visual tracking as a sequence generation task, forecasting object bounding boxes in an autoregressive manner. This differs from previous trackers, which depend on the design of intricate head networks, such as classification and regression heads. SeqTrack employs a basic encoder-decoder transformer architecture. The encoder utilizes a bidirectional transformer for feature extraction, while the decoder generates bounding box sequences autoregressively using a causal transformer. The loss function is a plain cross-entropy. Second, we introduce SeqTrackv2, a unified sequence-to-sequence framework for multi-modal tracking tasks. Expanding upon SeqTrack, SeqTrackv2 integrates a unified interface for auxiliary modalities and a set of task-prompt tokens to specify the task. This enables it to manage multi-modal tracking tasks using a unified model and parameter set. This sequence learning paradigm not only simplifies the tracking framework, but also showcases superior performance across 14 challenging benchmarks spanning five single- and multi-modal tracking tasks. The code and models are available at https://github.com/chenxin-dlut/SeqTrackv2.

PDF Abstract CVPR 2023 PDF CVPR 2023 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Visual Object Tracking GOT-10k SeqTrack-L384 Average Overlap 74.8 # 10
Success Rate 0.5 81.9 # 9
Success Rate 0.75 72.2 # 7
Rgb-T Tracking LasHeR SeqTrackv2-L256 Precision 74.1 # 2
Success 58.8 # 2
Rgb-T Tracking LasHeR SeqTrackv2-B384 Precision 71.5 # 4
Success 56.2 # 6
Rgb-T Tracking LasHeR SeqTrackv2-B256 Precision 70.4 # 5
Success 55.8 # 8
Rgb-T Tracking LasHeR SeqTrackv2-L384 Precision 76.7 # 1
Success 61.0 # 1
Visual Object Tracking LaSOT SeqTrack-L384 AUC 72.5 # 6
Normalized Precision 81.5 # 6
Precision 79.3 # 5
Visual Object Tracking LaSOT-ext SeqTrack-L384 AUC 50.7 # 7
Normalized Precision 61.6 # 4
Precision 57.5 # 6
Visual Object Tracking NeedForSpeed SeqTrack-L384 AUC 0.662 # 3
Visual Object Tracking OTB-2015 SeqTrack-L384 AUC 0.683 # 8
Rgb-T Tracking RGBT234 SeqTrackv2-L384 Precision 91.3 # 2
Success 68.0 # 2
Rgb-T Tracking RGBT234 SeqTrackv2-B256 Precision 88.0 # 5
Success 64.7 # 6
Rgb-T Tracking RGBT234 SeqTrackv2-L256 Precision 92.3 # 1
Success 68.5 # 1
Rgb-T Tracking RGBT234 SeqTrackv2-B384 Precision 90.0 # 3
Success 66.3 # 3
Visual Object Tracking TNL2K SeqTrack-L384 AUC 57.8 # 5
Visual Object Tracking TrackingNet SeqTrack-L384 Precision 85.8 # 5
Normalized Precision 89.8 # 3
Accuracy 85.5 # 5
Visual Object Tracking UAV123 SeqTrack-L384 AUC 0.685 # 8

Methods


No methods listed for this paper. Add relevant methods here