Object Detection
4332 papers with code • 114 benchmarks • 297 datasets
Object Detection is a computer vision task in which the goal is to detect and locate objects of interest in an image or video. The task involves identifying the position and boundaries of objects in an image, and classifying the objects into different categories. It forms a crucial part of vision recognition, alongside image classification and retrieval.
The state-of-the-art methods can be categorized into two main types: one-stage methods and two stage-methods:
-
One-stage methods prioritize inference speed, and example models include YOLO, SSD and RetinaNet.
-
Two-stage methods prioritize detection accuracy, and example models include Faster R-CNN, Mask R-CNN and Cascade R-CNN.
The most popular benchmark is the MSCOCO dataset. Models are typically evaluated according to a Mean Average Precision metric.
( Image credit: Detectron )
Libraries
Use these libraries to find Object Detection models and implementationsDatasets
Subtasks
- 3D Object Detection
- Real-Time Object Detection
- RGB Salient Object Detection
- Few-Shot Object Detection
- Few-Shot Object Detection
- Open Vocabulary Object Detection
- Video Object Detection
- Object Detection In Aerial Images
- RGB-D Salient Object Detection
- Weakly Supervised Object Detection
- Small Object Detection
- Robust Object Detection
- Zero-Shot Object Detection
- Camouflaged Object Segmentation
- Medical Object Detection
- Open World Object Detection
- Video Salient Object Detection
- Co-Salient Object Detection
- Object Proposal Generation
- Dense Object Detection
- Head Detection
- License Plate Detection
- Fracture detection
- Multiview Detection
- 3D Object Detection From Monocular Images
- Moving Object Detection
- One-Shot Object Detection
- Surgical tool detection
- Described Object Detection
- Body Detection
- Pupil Detection
- Object Detection In Indoor Scenes
- Class-agnostic Object Detection
- Weakly Supervised 3D Detection
- Object Skeleton Detection
- Semantic Part Detection
- Fish Detection
- Multiple Affordance Detection
Most implemented papers
Deep Residual Learning for Image Recognition
Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
YOLOv3: An Incremental Improvement
At 320x320 YOLOv3 runs in 22 ms at 28. 2 mAP, as accurate as SSD but three times faster.
Focal Loss for Dense Object Detection
Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.
YOLO9000: Better, Faster, Stronger
On the 156 classes not in COCO, YOLO9000 gets 16. 0 mAP.
YOLOv4: Optimal Speed and Accuracy of Object Detection
There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy.
SSD: Single Shot MultiBox Detector
Experimental results on the PASCAL VOC, MS COCO, and ILSVRC datasets confirm that SSD has comparable accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference.
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Mask R-CNN
Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance.
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
We present a class of efficient models called MobileNets for mobile and embedded vision applications.
MobileNetV2: Inverted Residuals and Linear Bottlenecks
In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes.