Open Vocabulary Object Detection

56 papers with code • 4 benchmarks • 6 datasets

Open-vocabulary detection (OVD) aims to generalize beyond the limited number of base classes labeled during the training phase. The goal is to detect novel classes defined by an unbounded (open) vocabulary at inference.

Benchmarks

Add a Result

These leaderboards are used to track progress in Open Vocabulary Object Detection

Dataset	Best Model	Compare
MSCOCO	Cooperative Foundational Models	See all
LVIS v1.0	DITO	See all
OpenImages-v4	Object-Centric-OVD	See all
Objects365	Object-Centric-OVD	See all

Libraries

Use these libraries to find Open Vocabulary Object Detection models and implementations

faceonlive/ai-research

2 papers

181

om-ai-lab/OmDet

2 papers

Datasets

Subtasks

Open Vocabulary Attribute Detection

Most implemented papers

Most implemented Social Latest No code

Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection

mmaaz60/mvits_for_class_agnostic_od • • 7 Jul 2022

Two popular forms of weak-supervision used in open-vocabulary detection (OVD) include pretrained CLIP model and image-level supervision.

Paper
Code

Exploiting Unlabeled Data with Vision and Language Models for Object Detection

xiaofeng94/vl-plm • • 18 Jul 2022

We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images, effectively generating pseudo labels for object detection.

Paper
Code

OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network

om-ai-lab/OmDet • • 10 Sep 2022

The advancement of object detection (OD) in open-vocabulary and open-world scenarios is a critical challenge in computer vision.

Paper
Code

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

google-research/google-research • • 30 Sep 2022

We present F-VLM, a simple open-vocabulary object detection method built upon Frozen Vision and Language Models.

Paper
Code

Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models

machengcheng2016/Subspace-Prompt-Learning • • 4 Nov 2022

Pretrained vision-language models (VLMs) such as CLIP have shown impressive generalization capability in downstream vision tasks with appropriate text prompts.

Paper
Code

Open-vocabulary Attribute Detection

OVAD-Benchmark/ovad-bechmark-code • • CVPR 2023

The objective of the novel task and benchmark is to probe object-level attribute information learned by vision-language models.

Paper
Code

Learning Object-Language Alignments for Open-Vocabulary Object Detection

clin1223/vldet • • 27 Nov 2022

In this paper, we propose a novel open-vocabulary object detection framework directly learning from image-text pair data.

Paper
Code

X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion

yoctta/xpaste • • 7 Dec 2022

We demonstrate for the first time that using a text2image model to generate images or zero-shot recognition model to filter noisily crawled images for different object categories is a feasible way to make Copy-Paste truly scalable.

Paper
Code