Real-Time Object Detection

106 papers with code • 7 benchmarks • 8 datasets

Real-Time Object Detection is a computer vision task that involves identifying and locating objects of interest in real-time video sequences with fast inference while maintaining a base level of accuracy.

This is typically solved using algorithms that combine object detection and tracking techniques to accurately detect and track objects in real-time. They use a combination of feature extraction, object proposal generation, and classification to detect and localize objects of interest.

( Image credit: CenterNet )

Libraries

Use these libraries to find Real-Time Object Detection models and implementations
8 papers
27,346
8 papers
21,359
8 papers
2,912
See all 36 libraries.

Most implemented papers

YOLOv3: An Incremental Improvement

open-mmlab/mmdetection 8 Apr 2018

At 320x320 YOLOv3 runs in 22 ms at 28. 2 mAP, as accurate as SSD but three times faster.

YOLO9000: Better, Faster, Stronger

AlexeyAB/darknet CVPR 2017

On the 156 classes not in COCO, YOLO9000 gets 16. 0 mAP.

Focal Loss for Dense Object Detection

facebookresearch/detectron ICCV 2017

Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.

YOLOv4: Optimal Speed and Accuracy of Object Detection

AlexeyAB/darknet 23 Apr 2020

There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

rbgirshick/py-faster-rcnn NeurIPS 2015

In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.

Mask R-CNN

tensorflow/models ICCV 2017

Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance.

You Only Look Once: Unified, Real-Time Object Detection

AlexeyAB/darknet CVPR 2016

A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation.

CSPNet: A New Backbone that can Enhance Learning Capability of CNN

AlexeyAB/darknet 27 Nov 2019

Neural networks have enabled state-of-the-art approaches to achieve incredible results on computer vision tasks such as object detection.

Objects as Points

xingyizhou/CenterNet 16 Apr 2019

We model an object as a single point --- the center point of its bounding box.

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

microsoft/Swin-Transformer ICCV 2021

This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision.