Open Vocabulary Object Detection

28 papers with code • 4 benchmarks • 5 datasets

Open-vocabulary detection (OVD) aims to generalize beyond the limited number of base classes labeled during the training phase. The goal is to detect novel classes defined by an unbounded (open) vocabulary at inference.

Most implemented papers

Open-vocabulary Object Detection via Vision and Language Knowledge Distillation

tensorflow/tpu ICLR 2022

On COCO, ViLD outperforms the previous state-of-the-art by 4. 8 on novel AP and 11. 4 on overall AP.

Simple Open-Vocabulary Object Detection with Vision Transformers

google-research/scenic 12 May 2022

Combining simple architectures with large-scale pre-training has led to massive improvements in image classification.

Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization

peixianchen/medet 22 Jun 2022

Open-vocabulary object detection (OVD) aims to scale up vocabulary size to detect objects of novel categories beyond the training vocabulary.

Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers

google-research/google-research CVPR 2023

We present Region-aware Open-vocabulary Vision Transformers (RO-ViT) - a contrastive image-text pretraining recipe to bridge the gap between image-level pretraining and open-vocabulary object detection.

Open-Vocabulary Object Detection Using Captions

alirezazareian/ovr-cnn CVPR 2021

Weakly supervised and zero-shot learning techniques have been explored to scale object detectors to more categories with less supervision, but they have not been as successful and widely adopted as supervised models.

Open Vocabulary Object Detection with Pseudo Bounding-Box Labels

salesforce/pb-ovd 18 Nov 2021

To enlarge the set of base classes, we propose a method to automatically generate pseudo bounding-box annotations of diverse objects from large-scale image-caption pairs.

RegionCLIP: Region-based Language-Image Pretraining

microsoft/regionclip CVPR 2022

However, we show that directly applying such models to recognize image regions for object detection leads to poor performance due to a domain shift: CLIP was trained to match an image as a whole to a text description, without capturing the fine-grained alignment between image regions and text spans.

Detecting Twenty-thousand Classes using Image-level Supervision

facebookresearch/Detic 7 Jan 2022

For the first time, we train a detector with all the twenty-one-thousand classes of the ImageNet dataset and show that it generalizes to new datasets without finetuning.

Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation

mengqidyangge/hierkd CVPR 2022

Open-vocabulary object detection aims to detect novel object categories beyond the training set.

Open-Vocabulary DETR with Conditional Matching

yuhangzang/ov-detr 22 Mar 2022

To this end, we propose a novel open-vocabulary detector based on DETR -- hence the name OV-DETR -- which, once trained, can detect any object given its class name or an exemplar image.