Visual Tracking
170 papers with code • 9 benchmarks • 26 datasets
Visual Tracking is an essential and actively researched problem in the field of computer vision with various real-world applications such as robotic services, smart surveillance systems, autonomous driving, and human-computer interaction. It refers to the automatic estimation of the trajectory of an arbitrary target object, usually specified by a bounding box in the first frame, as it moves around in subsequent video frames.
Source: Learning Reinforced Attentional Representation for End-to-End Visual Tracking
Libraries
Use these libraries to find Visual Tracking models and implementationsLatest papers
Observation, Analysis, and Solution: Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training
In this paper, we question if the extremely simple ViTs' fine-tuning performance with a small-scale architecture can also benefit from this pre-training paradigm, which is considerably less studied yet in contrast to the well-established lightweight architecture design methodology with sophisticated components introduced.
Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline
Current event-/frame-event based trackers undergo evaluation on short-term tracking datasets, however, the tracking of real-world scenarios involves long-term tracking, and the performance of existing tracking algorithms in these scenarios remains unclear.
VastTrack: Vast Category Visual Object Tracking
The rich annotations of VastTrack enables development of both the vision-only and the vision-language tracking.
Unifying Visual and Vision-Language Tracking via Contrastive Learning
Single object tracking aims to locate the target object in a video sequence according to the state specified by different modal references, including the initial bounding box (BBOX), natural language (NL), or both (NL+BBOX).
Explicit Visual Prompts for Visual Object Tracking
Specifically, we utilize spatio-temporal tokens to propagate information between consecutive frames without focusing on updating templates.
ODTrack: Online Dense Temporal Token Learning for Visual Tracking
To alleviate the above problem, we propose a simple, flexible and effective video-level tracking pipeline, named \textbf{ODTrack}, which densely associates the contextual relationships of video frames in an online token propagation manner.
Cross-Modal Object Tracking via Modality-Aware Fusion Network and A Large-Scale Dataset
Visual tracking often faces challenges such as invalid targets and decreased performance in low-light conditions when relying solely on RGB image sequences.
ZoomTrack: Target-aware Non-uniform Resizing for Efficient Visual Tracking
To this end, we non-uniformly resize the cropped image to have a smaller input size while the resolution of the area where the target is more likely to appear is higher and vice versa.
Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline
Tracking using bio-inspired event cameras has drawn more and more attention in recent years.
LiteTrack: Layer Pruning with Asynchronous Feature Extraction for Lightweight and Efficient Visual Tracking
As an example, our fastest variant, LiteTrack-B4, achieves 65. 2% AO on the GOT-10k benchmark, surpassing all preceding efficient trackers, while running over 100 fps with ONNX on the Jetson Orin NX edge device.