TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Object Detection	COCO minival	Grounding DINO	box AP	63.0	# 12
Object Detection	COCO test-dev	Grounding DINO	box mAP	63.0	# 17
Zero-Shot Object Detection	LVIS v1.0 minival	GroundingDINO-L	AP	33.9	# 4
Zero-Shot Object Detection	MSCOCO	Grounding DINO (without COCO data)	AP 0.5	52.5	# 1
Zero-Shot Object Detection	ODinW	Grounding DINO	Average Score	26.1	# 1
Zero Shot Segmentation	Segmentation in the Wild	Grounded-SAM	Mean AP	46.0	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/grounding-dino-marrying-dino-with-grounded/zero-shot-object-detection-on-mscoco)](https://paperswithcode.com/sota/zero-shot-object-detection-on-mscoco?p=grounding-dino-marrying-dino-with-grounded)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/grounding-dino-marrying-dino-with-grounded/zero-shot-object-detection-on-odinw)](https://paperswithcode.com/sota/zero-shot-object-detection-on-odinw?p=grounding-dino-marrying-dino-with-grounded)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/grounding-dino-marrying-dino-with-grounded/zero-shot-segmentation-on-segmentation-in-the)](https://paperswithcode.com/sota/zero-shot-segmentation-on-segmentation-in-the?p=grounding-dino-marrying-dino-with-grounded)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/grounding-dino-marrying-dino-with-grounded/zero-shot-object-detection-on-lvis-v1-0)](https://paperswithcode.com/sota/zero-shot-object-detection-on-lvis-v1-0?p=grounding-dino-marrying-dino-with-grounded)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/grounding-dino-marrying-dino-with-grounded/object-detection-on-coco-minival)](https://paperswithcode.com/sota/object-detection-on-coco-minival?p=grounding-dino-marrying-dino-with-grounded)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/grounding-dino-marrying-dino-with-grounded/object-detection-on-coco)](https://paperswithcode.com/sota/object-detection-on-coco?p=grounding-dino-marrying-dino-with-grounded)`

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

9 Mar 2023 · Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang ·

In this paper, we present an open-set object detector, called Grounding DINO, by marrying Transformer-based detector DINO with grounded pre-training, which can detect arbitrary objects with human inputs such as category names or referring expressions. The key solution of open-set object detection is introducing language to a closed-set detector for open-set concept generalization. To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion. While previous works mainly evaluate open-set object detection on novel categories, we propose to also perform evaluations on referring expression comprehension for objects specified with attributes. Grounding DINO performs remarkably well on all three settings, including benchmarks on COCO, LVIS, ODinW, and RefCOCO/+/g. Grounding DINO achieves a $52.5$ AP on the COCO detection zero-shot transfer benchmark, i.e., without any training data from COCO. It sets a new record on the ODinW zero-shot benchmark with a mean $26.1$ AP. Code will be available at \url{https://github.com/IDEA-Research/GroundingDINO}.

PDF Abstract

Code

Add Remove Mark official

idea-research/groundingdino official

↳ Quickstart in

Colab

Spaces

5,095

huggingface/transformers

125,796

IDEA-Research/Grounded-Segment-Anyt…

↳ Quickstart in

Colab

Spaces

Replicate

13,612

longzw1997/Open-GroundingDino

241

PaddlePaddle/PaddleMIX

217

See all 7 implementations

Tasks

Add Remove

Decoder

Object Detection

Referring Expression

Referring Expression Comprehension

Zero-Shot Object Detection

Zero Shot Segmentation

Datasets

MS COCO

LVIS MSCOCO

Segmentation in the Wild

Results from the Paper

Add Remove

Ranked #1 on Zero-Shot Object Detection on MSCOCO

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Object Detection	COCO minival	Grounding DINO	box AP	63.0	# 12	Compare
Object Detection	COCO test-dev	Grounding DINO	box mAP	63.0	# 17	Compare
Zero-Shot Object Detection	LVIS v1.0 minival	GroundingDINO-L	AP	33.9	# 4	Compare
Zero-Shot Object Detection	MSCOCO	Grounding DINO (without COCO data)	AP 0.5	52.5	# 1	Compare
Zero-Shot Object Detection	ODinW	Grounding DINO	Average Score	26.1	# 1	Compare
Zero Shot Segmentation	Segmentation in the Wild	Grounded-SAM	Mean AP	46.0	# 2	Compare

Methods

Add Remove

Dense Connections • Layer Normalization • Linear Layer • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Vision Transformer

Edit Social Preview

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove