Video Object Detection
71 papers with code • 8 benchmarks • 12 datasets
Video object detection is the task of detecting objects from a video as opposed to images.
( Image credit: Learning Motion Priors for Efficient Video Object Detection )
Libraries
Use these libraries to find Video Object Detection models and implementationsDatasets
Most implemented papers
Emerging Properties in Self-Supervised Vision Transformers
In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets).
TSM: Temporal Shift Module for Efficient Video Understanding
The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.
Mobile Video Object Detection with Temporally-Aware Feature Maps
This paper introduces an online model for object detection in videos designed to run in real-time on low-powered mobile and embedded devices.
Towards High Performance Video Object Detection for Mobiles
In this paper, we present a light weight network architecture for video object detection on mobiles.
Transferable Adversarial Attacks for Image and Video Object Detection
Adversarial examples have been demonstrated to threaten many computer vision tasks including object detection.
Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection
In this paper we propose a method that leverages temporal context from the unlabeled frames of a novel camera to improve performance at that camera.
HoughNet: Integrating near and long-range evidence for visual detection
This paper presents HoughNet, a one-stage, anchor-free, voting-based, bottom-up object detection method.
TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers
Detection Transformer (DETR) and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors.
Flow-Guided Feature Aggregation for Video Object Detection
The accuracy of detection suffers from degenerated object appearances in videos, e. g., motion blur, video defocus, rare poses, etc.
Looking Fast and Slow: Memory-Guided Mobile Video Object Detection
Models and examples built with TensorFlow