Video Object Tracking
25 papers with code • 3 benchmarks • 8 datasets
Video Object Detection aims to detect targets in videos using both spatial and temporal information. It's usually deeply integrated with tasks such as Object Detection and Object Tracking.
Libraries
Use these libraries to find Video Object Tracking models and implementationsMost implemented papers
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
The paucity of videos in current action classification datasets (UCF-101 and HMDB-51) has made it difficult to identify good video architectures, as most methods obtain similar performance on existing small-scale benchmarks.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
TSM: Temporal Shift Module for Efficient Video Understanding
The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.
Video Polyp Segmentation: A Deep Learning Perspective
We present the first comprehensive video polyp segmentation (VPS) study in the deep learning era.
Bridging the Gap Between End-to-end and Non-End-to-end Multi-Object Tracking
Existing end-to-end Multi-Object Tracking (e2e-MOT) methods have not surpassed non-end-to-end tracking-by-detection methods.
Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Tracking
The key innovations of the proposed method include adaptive spatial feature selection and temporal consistent constraints, with which the new tracker enables joint spatial-temporal filter learning in a lower dimensional discriminative manifold.
Weakly Supervised Convolutional LSTM Approach for Tool Tracking in Laparoscopic Videos
Results: We build a baseline tracker on top of the CNN model and demonstrate that our approach based on the ConvLSTM outperforms the baseline in tool presence detection, spatial localization, and motion tracking by over 5. 0%, 13. 9%, and 12. 6%, respectively.
CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning
In this work, we build a video dataset with fully observable and controllable object and scene bias, and which truly requires spatiotemporal understanding in order to be solved.
SPARK: Spatial-aware Online Incremental Attack Against Visual Tracking
We identify that online object tracking poses two new challenges: 1) it is difficult to generate imperceptible perturbations that can transfer across frames, and 2) real-time trackers require the attack to satisfy a certain level of efficiency.
Argus: Efficient Activity Detection System for Extended Video Analysis
We propose an Efficient Activity Detection System, Argus, for Extended Video Analysis in the surveillance scenario.