Video Object Tracking

28 papers with code • 3 benchmarks • 11 datasets

Video Object Detection aims to detect targets in videos using both spatial and temporal information. It's usually deeply integrated with tasks such as Object Detection and Object Tracking.


Use these libraries to find Video Object Tracking models and implementations
3 papers

Most implemented papers

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

open-mmlab/mmaction2 CVPR 2017

The paucity of videos in current action classification datasets (UCF-101 and HMDB-51) has made it difficult to identify good video architectures, as most methods obtain similar performance on existing small-scale benchmarks.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 CVPR 2023

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

TSM: Temporal Shift Module for Efficient Video Understanding

MIT-HAN-LAB/temporal-shift-module ICCV 2019

The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.

Video Polyp Segmentation: A Deep Learning Perspective

DengPingFan/PraNet 27 Mar 2022

We present the first comprehensive video polyp segmentation (VPS) study in the deep learning era.

CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning

rohitgirdhar/CATER 10 Oct 2019

In this work, we build a video dataset with fully observable and controllable object and scene bias, and which truly requires spatiotemporal understanding in order to be solved.

Bridging the Gap Between End-to-end and Non-End-to-end Multi-Object Tracking

BingfengYan/CO-MOT 22 May 2023

Existing end-to-end Multi-Object Tracking (e2e-MOT) methods have not surpassed non-end-to-end tracking-by-detection methods.

Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Tracking


The key innovations of the proposed method include adaptive spatial feature selection and temporal consistent constraints, with which the new tracker enables joint spatial-temporal filter learning in a lower dimensional discriminative manifold.

Weakly Supervised Convolutional LSTM Approach for Tool Tracking in Laparoscopic Videos

CAMMA-public/ConvLSTM-Surgical-Tool-Tracker 4 Dec 2018

Results: We build a baseline tracker on top of the CNN model and demonstrate that our approach based on the ConvLSTM outperforms the baseline in tool presence detection, spatial localization, and motion tracking by over 5. 0%, 13. 9%, and 12. 6%, respectively.

SPARK: Spatial-aware Online Incremental Attack Against Visual Tracking

tsingqguo/AttackTracker ECCV 2020

We identify that online object tracking poses two new challenges: 1) it is difficult to generate imperceptible perturbations that can transfer across frames, and 2) real-time trackers require the attack to satisfy a certain level of efficiency.