Open Vocabulary Object Detection

56 papers with code • 4 benchmarks • 6 datasets

Open-vocabulary detection (OVD) aims to generalize beyond the limited number of base classes labeled during the training phase. The goal is to detect novel classes defined by an unbounded (open) vocabulary at inference.

Libraries

Use these libraries to find Open Vocabulary Object Detection models and implementations

Most implemented papers

Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection

mmaaz60/mvits_for_class_agnostic_od 7 Jul 2022

Two popular forms of weak-supervision used in open-vocabulary detection (OVD) include pretrained CLIP model and image-level supervision.

Exploiting Unlabeled Data with Vision and Language Models for Object Detection

xiaofeng94/vl-plm 18 Jul 2022

We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images, effectively generating pseudo labels for object detection.

OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network

om-ai-lab/OmDet 10 Sep 2022

The advancement of object detection (OD) in open-vocabulary and open-world scenarios is a critical challenge in computer vision.

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

google-research/google-research 30 Sep 2022

We present F-VLM, a simple open-vocabulary object detection method built upon Frozen Vision and Language Models.

Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models

machengcheng2016/Subspace-Prompt-Learning 4 Nov 2022

Pretrained vision-language models (VLMs) such as CLIP have shown impressive generalization capability in downstream vision tasks with appropriate text prompts.

Open-vocabulary Attribute Detection

OVAD-Benchmark/ovad-bechmark-code CVPR 2023

The objective of the novel task and benchmark is to probe object-level attribute information learned by vision-language models.

Learning Object-Language Alignments for Open-Vocabulary Object Detection

clin1223/vldet 27 Nov 2022

In this paper, we propose a novel open-vocabulary object detection framework directly learning from image-text pair data.

X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion

yoctta/xpaste 7 Dec 2022

We demonstrate for the first time that using a text2image model to generate images or zero-shot recognition model to filter noisily crawled images for different object categories is a feasible way to make Copy-Paste truly scalable.

Learning To Generate Language-Supervised and Open-Vocabulary Scene Graph Using Pre-Trained Visual-Semantic Space

zyong812/VS3_CVPR23 CVPR 2023

Specifically, cheap scene graph supervision data can be easily obtained by parsing image language descriptions into semantic graphs.

Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection

hikvision-research/opera ICCV 2023

Current methods for open-vocabulary object detection (OVOD) rely on a pre-trained vision-language model (VLM) to acquire the recognition ability.