Object Detection
3643 papers with code • 84 benchmarks • 251 datasets
Object Detection is a computer vision task in which the goal is to detect and locate objects of interest in an image or video. The task involves identifying the position and boundaries of objects in an image, and classifying the objects into different categories. It forms a crucial part of vision recognition, alongside image classification and retrieval.
The state-of-the-art methods can be categorized into two main types: one-stage methods and two stage-methods:
-
One-stage methods prioritize inference speed, and example models include YOLO, SSD and RetinaNet.
-
Two-stage methods prioritize detection accuracy, and example models include Faster R-CNN, Mask R-CNN and Cascade R-CNN.
The most popular benchmark is the MSCOCO dataset. Models are typically evaluated according to a Mean Average Precision metric.
( Image credit: Detectron )
Libraries
Use these libraries to find Object Detection models and implementationsDatasets
Subtasks
- 3D Object Detection
- Real-Time Object Detection
- RGB Salient Object Detection
- Few-Shot Object Detection
- Few-Shot Object Detection
- Video Object Detection
- RGB-D Salient Object Detection
- Object Detection In Aerial Images
- Weakly Supervised Object Detection
- Open Vocabulary Object Detection
- Robust Object Detection
- Small Object Detection
- Medical Object Detection
- Zero-Shot Object Detection
- Co-Salient Object Detection
- Object Proposal Generation
- Dense Object Detection
- Video Salient Object Detection
- Open World Object Detection
- Camouflaged Object Segmentation
- License Plate Detection
- Head Detection
- Multiview Detection
- One-Shot Object Detection
- Moving Object Detection
- Surgical tool detection
- 3D Object Detection From Monocular Images
- Body Detection
- Pupil Detection
- Object Detection In Indoor Scenes
- Described Object Detection
- Semantic Part Detection
- Class-agnostic Object Detection
- Object Skeleton Detection
- Fish Detection
- Multiple Affordance Detection
Latest papers
Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot Visual Question Answering
This work explores the zero-shot capabilities of foundation models in Visual Question Answering (VQA) tasks.
MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining
However, transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.
RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition
Notably, our approach demonstrates a significant improvement in performance on 5 fine-grained visual recognition benchmarks, 11 few-shot image recognition datasets, and the 2 object detection datasets under the zero-shot recognition setting.
Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments
However, in the real-world where test ground truths are not provided, it is non-trivial to find out whether bounding boxes are accurate, thus preventing us from assessing the detector generalization ability.
Wildfire danger prediction optimization with transfer learning
Convolutional Neural Networks (CNNs) have proven instrumental across various computer science domains, enabling advancements in object detection, classification, and anomaly detection.
LSKNet: A Foundation Lightweight Backbone for Remote Sensing
While a considerable amount of research has been dedicated to remote sensing classification, object detection and semantic segmentation, most of these studies have overlooked the valuable prior knowledge embedded within remote sensing scenarios.
Continual Forgetting for Pre-trained Vision Models
(i) For unwanted knowledge, efficient and effective deleting is crucial.
Align and Distill: Unifying and Improving Domain Adaptive Object Detection
We address these problems by introducing: (1) A unified benchmarking and implementation framework, Align and Distill (ALDI), enabling comparison of DAOD methods and supporting future development, (2) A fair and modern training and evaluation protocol for DAOD that addresses benchmarking pitfalls, (3) A new DAOD benchmark dataset, CFC-DAOD, enabling evaluation on diverse real-world data, and (4) A new method, ALDI++, that achieves state-of-the-art results by a large margin.
CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object Detection under Unknown Degradations
Object detection methods under known single degradations have been extensively investigated.
YOLOv9 for Fracture Detection in Pediatric Wrist Trauma X-ray Images
The introduction of YOLOv9, the latest version of the You Only Look Once (YOLO) series, has led to its widespread adoption across various scenarios.