Object Detection

3789 papers with code • 92 benchmarks • 267 datasets

Object Detection is a computer vision task in which the goal is to detect and locate objects of interest in an image or video. The task involves identifying the position and boundaries of objects in an image, and classifying the objects into different categories. It forms a crucial part of vision recognition, alongside image classification and retrieval.

The state-of-the-art methods can be categorized into two main types: one-stage methods and two stage-methods:

One-stage methods prioritize inference speed, and example models include YOLO, SSD and RetinaNet.
Two-stage methods prioritize detection accuracy, and example models include Faster R-CNN, Mask R-CNN and Cascade R-CNN.

The most popular benchmark is the MSCOCO dataset. Models are typically evaluated according to a Mean Average Precision metric.

( Image credit: Detectron )

Benchmarks

Add a Result

These leaderboards are used to track progress in Object Detection

Dataset	Best Model	Compare
COCO test-dev	Co-DETR	See all
COCO minival	Co-DETR	See all
COCO-O	EVA	See all
PASCAL VOC 2007	Cascade Eff-B7 NAS-FPN (Copy Paste pre-training, single-scale)	See all
COCO 2017 val	Salience-DETR (Focal-L 1x)	See all
COCO 2017	MaxViT-B	See all
CrowdHuman (full body)	InternImage-H	See all
CPPE-5	TridentNet	See all
Waymo 2D detection all_ns f0val	YOLOX-L	See all
Manga109-s 15test	YOLOX-L	See all
USB (Standard USB 1.0 protocol)	UniverseNet-20.08	See all
LVIS v1.0 val	Co-DETR (single-scale)	See all
GEN1 Detection	ERGO-12	See all
SeaDronesSee	Synth Pretrained Faster R-CNN ResNeXt-101-FPN	See all
UA-DETRAC	VSTAM	See all
GRAZPEDWRI-DX	YOLOv8+ResCBAM	See all
UAVDT	PRB-FPN	See all
PASCAL VOC 2012	InternImage-H	See all
NAO	Mask RCNN R50	See all
TBBR	Swin-T (ImageNet-1k pretrain)	See all
ODinW Full-Shot 13 Tasks	Grounding DINO 1.5 Pro	See all
PeopleArt	PVT (Pyramid Vision Transformer; trained on PeopleArt and PopArt)	See all
BigDetection val	Cascade R-CNN (R50-FPN)	See all
KITTI Cars Easy	Patches	See all
KITTI Cars Hard	Patches	See all
iSAID	PANet++	See all
Waymo Open Dataset	LeapMotor_Det	See all
AI-TOD	DetectoRS + NWD (ResNet-50-FPN)	See all
LVIS v1.0 minival	Co-DETR (single-scale)	See all
India Driving Dataset	YOLOv5x	See all
KITTI Cars Moderate	Patches	See all
WiderPerson	IterDet (Faster RCNN, ResNet50, 2 iterations)	See all
WaterScenes	YOLOv8-M	See all
Visual Genome	AP (%)	See all
MS COCO	MOAT-2	See all
VisDrone-DET2019	PP-YOLOE-plus	See all
FlickrLogos-32	Logo-Yolo	See all
VEDAI	GHOST	See all
SIXray	LRPz	See all
nuScenes	RANet(Radar)	See all
PASCAL VOC 10%	DETReg (MDef-DETR)	See all
MSCOCO	YOLOv5s	See all
Drone vs Bird	OBSS YOLOv5+Track Boosting (Including Synthetic Data)	See all
SpaceNet 2	YOLT	See all
ODinW Full-shot 35 Tasks	Grounding DINO 1.5 Pro	See all
Manga109	DASS-Detector (YOLOX XL)	See all
BDD100K val	PP-YOLOE	See all
OpenImages-v6	ScaleDet	See all
Pascal VOC to Clipart1K	MILA	See all
BDD100K	CDDMSL	See all
PASCAL Part 2010 - Animals	Attention-based Joint Detection of Object and Semantic Part	See all
SUN-RGBD val	CDSSD	See all
KITTI Pedestrians Easy	Vote3Deep	See all
KITTI Pedestrians Moderate	Vote3Deep	See all
KITTI Pedestrians Hard	Vote3Deep	See all
KITTI Cyclists Easy	Vote3Deep	See all
KITTI Cyclists Moderate	Vote3Deep	See all
KITTI Cyclists Hard	Vote3Deep	See all
Extragalactic Planetary Nebulae	PNe within NGC1380 & NGC1404	See all
COCO+	RepPoints + Self-adaptation	See all
Waymo 2D detection all_ns test	UniverseNet	See all
DeepTrash	YOLOv5	See all
MJU-Waste	EfficientDet-D2	See all
UAVVaste	EfficientDet-D2	See all
Extended TACO-1	EfficientDet-D2	See all
Drinking Waste Classification	EfficientDet-D2	See all
Extended TACO-7	EfficientDet-D2	See all
A2D	RL [10] Lpixel	See all
STN PLAD	MS-PAD	See all
A Dataset of Multispectral Potato Plants Images	Retina-UNet-Ag	See all
SpaceNet 1	YOLT	See all
AquaTrash	Aquavision	See all
ELEVATER	GLIP-T	See all
Object Detection on COCO minival	DaViT-T (Mask R-CNN, 36 epochs)	See all
Cityscapes to Foggy Cityscapes	CDDMSL	See all
CrowdHuman	S-RCNN+Ours	See all
CityPersons	V2F-Net	See all
SHEL5K	YOLO	See all
LVIS v1.0	ScaleDet	See all
Objects365	ScaleDet	See all
VisDrone- 1% labeled data	SSOD + Crop (L + U)	See all
VisDrone - 5% labeled data	SSOD + Crop (L + U)	See all
VisDrone - 10% labeled data	SSOD + Crop (L + U)	See all
Multispectral Dataset	TarDAL	See all
PASCAL VOC	TinyissimoYOLO-v8	See all
MSCOCO	DAS	See all
LDD	R^3-CNN	See all
Watercolor2k	CDDMSL	See all
Comic2k	CDDMSL	See all
Clipart1k	CDDMSL	See all
PASCAL VOC to Watercolor2k	CDDMSL	See all
PASCAL VOC to Comic2k	CDDMSL	See all

Show all 92 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Object Detection models and implementations

PaddlePaddle/PaddleDetection

71 papers

12,223

open-mmlab/mmdetection

64 papers

28,200

rwightman/pytorch-image-models

39 papers

30,258

osmr/imgclsmob

20 papers

2,926

See all 42 libraries.

Datasets

Subtasks

Few-Shot Object Detection

Video Object Detection

Open Vocabulary Object Detection

RGB-D Salient Object Detection

Object Detection In Aerial Images

Weakly Supervised Object Detection

Small Object Detection

Robust Object Detection

Zero-Shot Object Detection

Medical Object Detection

Open World Object Detection

Object Proposal Generation

Co-Salient Object Detection

Dense Object Detection

Video Salient Object Detection

Camouflaged Object Segmentation

License Plate Detection

Head Detection

Multiview Detection

3D Object Detection From Monocular Images

One-Shot Object Detection

Moving Object Detection

Surgical tool detection

Described Object Detection

Body Detection

Pupil Detection

Object Detection In Indoor Scenes

Class-agnostic Object Detection

Semantic Part Detection

Object Skeleton Detection

Fish Detection

Multiple Affordance Detection

Weakly Supervised 3D Detection

Latest papers

Most implemented Social Latest No code

Bangladeshi Native Vehicle Detection in Wild

bipin-saha/bnvd • 20 May 2024

To advance terrestrial object detection research, this paper proposes a native vehicle detection dataset for the most commonly appeared vehicle classes in Bangladesh.

20 May 2024

Paper
Code

SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

xinghaochen/slab • • 19 May 2024

However, replacing LayerNorm with more efficient BatchNorm in transformer often leads to inferior performance and collapse in training.

19 May 2024

Paper
Code

FADet: A Multi-sensor 3D Object Detection Network based on Local Featured Attention

ziongo6/fadet • 19 May 2024

Camera, LiDAR and radar are common perception sensors for autonomous driving tasks.

19 May 2024

Paper
Code

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

idea-research/grounding-dino-1.5-api • 16 May 2024

Empirical results demonstrate the effectiveness of Grounding DINO 1. 5, with the Grounding DINO 1. 5 Pro model attaining a 54. 3 AP on the COCO detection benchmark and a 55. 7 AP on the LVIS-minival zero-shot transfer benchmark, setting new records for open-set object detection.

463

16 May 2024

Paper
Code

DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data

aim-uofa/DiverGen • 16 May 2024

Instance segmentation is data-hungry, and as model capacity increases, data scale becomes crucial for improving the accuracy.

16 May 2024

Paper
Code

Grounded 3D-LLM with Referent Tokens

OpenRobotLab/Grounded_3D-LLM • 16 May 2024

Prior studies on 3D scene understanding have primarily developed specialized models for specific tasks or required task-specific fine-tuning.

16 May 2024

Paper
Code

SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection

naver/shine • • 16 May 2024

Open-vocabulary object detection (OvOD) has transformed detection into a language-guided task, empowering users to freely define their class vocabularies of interest during inference.

16 May 2024

Paper
Code

Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection

ferry-li/si-sod • • 16 May 2024

This paper explores the size-invariance of evaluation metrics in Salient Object Detection (SOD), especially when multiple targets of diverse sizes co-exist in the same image.

16 May 2024

Paper
Code

SpecDETR: A Transformer-based Hyperspectral Point Object Detection Network

zhaoxuli123/specdetr • 16 May 2024

We develop a simulated hyperSpectral Point Object Detection benchmark termed SPOD, and for the first time, evaluate and compare the performance of current object detection networks and HTD methods on hyperspectral multi-class point object detection.

16 May 2024

Paper
Code

Towards Task-Compatible Compressible Representations

adeandrade/research • • 16 May 2024

We evaluate the impact of this idea in the context of input reconstruction more rigorously and extended it to other computer vision tasks.

16 May 2024

Paper
Code

Object Detection

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result