Video Object Tracking
28 papers with code • 3 benchmarks • 11 datasets
Video Object Detection aims to detect targets in videos using both spatial and temporal information. It's usually deeply integrated with tasks such as Object Detection and Object Tracking.
Libraries
Use these libraries to find Video Object Tracking models and implementationsLatest papers
UniVS: Unified and Universal Video Segmentation with Prompts as Queries
Despite the recent advances in unified image segmentation (IS), developing a unified video segmentation (VS) model remains a challenge.
ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe
We present ARTrackV2, which integrates two pivotal aspects of tracking: determining where to look (localization) and how to describe (appearance analysis) the target object across video frames.
Single-Model and Any-Modality for Video Object Tracking
In practice, most existing RGB trackers learn a single set of parameters to use them across datasets and applications.
Bridging the Gap Between End-to-end and Non-End-to-end Multi-Object Tracking
Existing end-to-end Multi-Object Tracking (e2e-MOT) methods have not surpassed non-end-to-end tracking-by-detection methods.
Track Anything: Segment Anything Meets Videos
Therefore, in this report, we propose Track Anything Model (TAM), which achieves high-performance interactive tracking and segmentation in videos.
Target-Aware Tracking with Long-term Context Attention
Most deep trackers still follow the guidance of the siamese paradigms and use a template that contains only the target without any contextual information, which makes it difficult for the tracker to cope with large appearance changes, rapid target movement, and attraction from similar objects.
A Real-Time Wrong-Way Vehicle Detection Based on YOLO and Centroid Tracking
By detecting wrong-way vehicles, the number of accidents can be minimized and traffic jam can be reduced.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Revealing the Dark Secrets of Masked Image Modeling
In this paper, we compare MIM with the long-dominant supervised pre-trained models from two perspectives, the visualizations and the experiments, to uncover their key representational differences.
Learning What and Where: Disentangling Location and Identity Tracking Without Supervision
Moreover, it can anticipate object motion and interactions, which are crucial abilities for conceptual planning and reasoning.