Object Detection
3789 papers with code • 92 benchmarks • 267 datasets
Object Detection is a computer vision task in which the goal is to detect and locate objects of interest in an image or video. The task involves identifying the position and boundaries of objects in an image, and classifying the objects into different categories. It forms a crucial part of vision recognition, alongside image classification and retrieval.
The state-of-the-art methods can be categorized into two main types: one-stage methods and two stage-methods:
-
One-stage methods prioritize inference speed, and example models include YOLO, SSD and RetinaNet.
-
Two-stage methods prioritize detection accuracy, and example models include Faster R-CNN, Mask R-CNN and Cascade R-CNN.
The most popular benchmark is the MSCOCO dataset. Models are typically evaluated according to a Mean Average Precision metric.
( Image credit: Detectron )
Libraries
Use these libraries to find Object Detection models and implementationsDatasets
Subtasks
- 3D Object Detection
- Real-Time Object Detection
- RGB Salient Object Detection
- Few-Shot Object Detection
- Few-Shot Object Detection
- Video Object Detection
- Open Vocabulary Object Detection
- RGB-D Salient Object Detection
- Object Detection In Aerial Images
- Weakly Supervised Object Detection
- Small Object Detection
- Robust Object Detection
- Zero-Shot Object Detection
- Medical Object Detection
- Open World Object Detection
- Object Proposal Generation
- Co-Salient Object Detection
- Dense Object Detection
- Video Salient Object Detection
- Camouflaged Object Segmentation
- License Plate Detection
- Head Detection
- Multiview Detection
- 3D Object Detection From Monocular Images
- One-Shot Object Detection
- Moving Object Detection
- Surgical tool detection
- Described Object Detection
- Body Detection
- Pupil Detection
- Object Detection In Indoor Scenes
- Class-agnostic Object Detection
- Semantic Part Detection
- Object Skeleton Detection
- Fish Detection
- Multiple Affordance Detection
- Weakly Supervised 3D Detection
Latest papers
Bangladeshi Native Vehicle Detection in Wild
To advance terrestrial object detection research, this paper proposes a native vehicle detection dataset for the most commonly appeared vehicle classes in Bangladesh.
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
However, replacing LayerNorm with more efficient BatchNorm in transformer often leads to inferior performance and collapse in training.
FADet: A Multi-sensor 3D Object Detection Network based on Local Featured Attention
Camera, LiDAR and radar are common perception sensors for autonomous driving tasks.
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Empirical results demonstrate the effectiveness of Grounding DINO 1. 5, with the Grounding DINO 1. 5 Pro model attaining a 54. 3 AP on the COCO detection benchmark and a 55. 7 AP on the LVIS-minival zero-shot transfer benchmark, setting new records for open-set object detection.
DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data
Instance segmentation is data-hungry, and as model capacity increases, data scale becomes crucial for improving the accuracy.
Grounded 3D-LLM with Referent Tokens
Prior studies on 3D scene understanding have primarily developed specialized models for specific tasks or required task-specific fine-tuning.
SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection
Open-vocabulary object detection (OvOD) has transformed detection into a language-guided task, empowering users to freely define their class vocabularies of interest during inference.
Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection
This paper explores the size-invariance of evaluation metrics in Salient Object Detection (SOD), especially when multiple targets of diverse sizes co-exist in the same image.
SpecDETR: A Transformer-based Hyperspectral Point Object Detection Network
We develop a simulated hyperSpectral Point Object Detection benchmark termed SPOD, and for the first time, evaluate and compare the performance of current object detection networks and HTD methods on hyperspectral multi-class point object detection.
Towards Task-Compatible Compressible Representations
We evaluate the impact of this idea in the context of input reconstruction more rigorously and extended it to other computer vision tasks.