TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Open Vocabulary Object Detection	LVIS v1.0	ViLD-ensemble w/ ALIGN (Eb7-FPN)	AP novel-LVIS base training	26.3	# 8
Open Vocabulary Object Detection	LVIS v1.0	ViLD-ensemble w/ ALIGN (Eb7-FPN)	AP novel-Unrestricted open-vocabulary training	27.0	# 3
Open Vocabulary Object Detection	LVIS v1.0	ViLD-ensemble (R152-FPN)	AP novel-LVIS base training	18.7	# 18
Open Vocabulary Object Detection	LVIS v1.0	ViLD-ensemble (R152-FPN)	AP novel-Unrestricted open-vocabulary training	19.8	# 5
Open Vocabulary Object Detection	LVIS v1.0	ViLD-ensemble (R50-FPN)	AP novel-LVIS base training	16.6	# 21
Open Vocabulary Object Detection	LVIS v1.0	ViLD-ensemble (R50-FPN)	AP novel-Unrestricted open-vocabulary training	16.7	# 6
Open Vocabulary Object Detection	LVIS v1.0	ViLD (R50-FPN)	AP novel-LVIS base training	16.1	# 22
Open Vocabulary Object Detection	LVIS v1.0	ViLD (R50-FPN)	AP novel-Unrestricted open-vocabulary training	16.3	# 7
Open Vocabulary Object Detection	MSCOCO	ViLD	AP 0.5	27.6	# 24
Open Vocabulary Object Detection	Objects365	ViLD	mask AP50	18.2	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/zero-shot-detection-via-vision-and-language/open-vocabulary-object-detection-on-1)](https://paperswithcode.com/sota/open-vocabulary-object-detection-on-1?p=zero-shot-detection-via-vision-and-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/zero-shot-detection-via-vision-and-language/open-vocabulary-object-detection-on-lvis-v1-0)](https://paperswithcode.com/sota/open-vocabulary-object-detection-on-lvis-v1-0?p=zero-shot-detection-via-vision-and-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/zero-shot-detection-via-vision-and-language/open-vocabulary-object-detection-on-mscoco)](https://paperswithcode.com/sota/open-vocabulary-object-detection-on-mscoco?p=zero-shot-detection-via-vision-and-language)`

Open-vocabulary Object Detection via Vision and Language Knowledge Distillation

ICLR 2022 · Xiuye Gu, Tsung-Yi Lin, Weicheng Kuo, Yin Cui ·

We aim at advancing open-vocabulary object detection, which detects objects described by arbitrary text inputs. The fundamental challenge is the availability of training data. It is costly to further scale up the number of classes contained in existing object detection datasets. To overcome this challenge, we propose ViLD, a training method via Vision and Language knowledge Distillation. Our method distills the knowledge from a pretrained open-vocabulary image classification model (teacher) into a two-stage detector (student). Specifically, we use the teacher model to encode category texts and image regions of object proposals. Then we train a student detector, whose region embeddings of detected boxes are aligned with the text and image embeddings inferred by the teacher. We benchmark on LVIS by holding out all rare categories as novel categories that are not seen during training. ViLD obtains 16.1 mask AP$_r$ with a ResNet-50 backbone, even outperforming the supervised counterpart by 3.8. When trained with a stronger teacher model ALIGN, ViLD achieves 26.3 AP$_r$. The model can directly transfer to other datasets without finetuning, achieving 72.2 AP$_{50}$ on PASCAL VOC, 36.6 AP on COCO and 11.8 AP on Objects365. On COCO, ViLD outperforms the previous state-of-the-art by 4.8 on novel AP and 11.4 on overall AP. Code and demo are open-sourced at https://github.com/tensorflow/tpu/tree/master/models/official/detection/projects/vild.

PDF Abstract ICLR 2022 PDF ICLR 2022 Abstract

Code

Add Remove Mark official

tensorflow/tpu official

5,176

tensorflow/tpu official

5,176

hanoonaR/object-centric-ovd

↳ Quickstart in

Colab

277

dyabel/detpro

159

Tasks

Add Remove

Image Classification

Knowledge Distillation

object-detection

Object Detection

Open Vocabulary Object Detection

Zero-Shot Image Classification

Zero-Shot Object Detection

Datasets

MS COCO

CUB-200-2011

LVIS

PASCAL VOC

Objects365 MSCOCO

Results from the Paper

Edit

Ranked #2 on Open Vocabulary Object Detection on Objects365

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Open Vocabulary Object Detection	LVIS v1.0	ViLD-ensemble w/ ALIGN (Eb7-FPN)	AP novel-LVIS base training	26.3	# 8	Compare
Open Vocabulary Object Detection	LVIS v1.0	ViLD-ensemble w/ ALIGN (Eb7-FPN)	AP novel-Unrestricted open-vocabulary training	27.0	# 3	Compare
Open Vocabulary Object Detection	LVIS v1.0	ViLD-ensemble (R152-FPN)	AP novel-LVIS base training	18.7	# 18	Compare
Open Vocabulary Object Detection	LVIS v1.0	ViLD-ensemble (R152-FPN)	AP novel-Unrestricted open-vocabulary training	19.8	# 5	Compare
Open Vocabulary Object Detection	LVIS v1.0	ViLD-ensemble (R50-FPN)	AP novel-LVIS base training	16.6	# 21	Compare
Open Vocabulary Object Detection	LVIS v1.0	ViLD-ensemble (R50-FPN)	AP novel-Unrestricted open-vocabulary training	16.7	# 6	Compare
Open Vocabulary Object Detection	LVIS v1.0	ViLD (R50-FPN)	AP novel-LVIS base training	16.1	# 22	Compare
Open Vocabulary Object Detection	LVIS v1.0	ViLD (R50-FPN)	AP novel-Unrestricted open-vocabulary training	16.3	# 7	Compare
Open Vocabulary Object Detection	MSCOCO	ViLD	AP 0.5	27.6	# 24	Compare
Open Vocabulary Object Detection	Objects365	ViLD	mask AP50	18.2	# 2	Compare

Methods

Add Remove

Convolution • Mask R-CNN • RoIAlign • RPN • Softmax

Edit Social Preview

Open-vocabulary Object Detection via Vision and Language Knowledge Distillation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove