A Distractor-Aware Memory for Visual Object Tracking with SAM2

26 Nov 2024  ·  Jovana Videnovic, Alan Lukezic, Matej Kristan ·

Memory-based trackers are video object segmentation methods that form the target model by concatenating recently tracked frames into a memory buffer and localize the target by attending the current image to the buffered frames. While already achieving top performance on many benchmarks, it was the recent release of SAM2 that placed memory-based trackers into focus of the visual object tracking community. Nevertheless, modern trackers still struggle in the presence of distractors. We argue that a more sophisticated memory model is required, and propose a new distractor-aware memory model for SAM2 and an introspection-based update strategy that jointly addresses the segmentation accuracy as well as tracking robustness. The resulting tracker is denoted as SAM2.1++. We also propose a new distractor-distilled DiDi dataset to study the distractor problem better. SAM2.1++ outperforms SAM2.1 and related SAM memory extensions on seven benchmarks and sets a solid new state-of-the-art on six of them.

PDF Abstract

Datasets


Introduced in the Paper:

DiDi

Used in the Paper:

LaSOT GOT-10k VOTChallenge VOT2020 VOT2022

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Visual Object Tracking DiDi DAM4SAM Tracking quality 0.694 # 1
Visual Object Tracking GOT-10k DAM4SAM Average Overlap 81.1 # 2
Visual Object Tracking LaSOT DAM4SAM AUC 75.1 # 4
Visual Object Tracking LaSOT-ext DAM4SAM AUC 60.9 # 2
Semi-Supervised Video Object Segmentation VOT2020 DAM4SAM EAO 0.729 # 1
Visual Object Tracking VOT2022 DAM4SAM EAO 0.753 # 1

Methods