Zero-Shot Object Detection

35 papers with code • 7 benchmarks • 6 datasets

Zero-shot object detection (ZSD) is the task of object detection where no visual training data is available for some of the target object classes.

( Image credit: Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts )


Use these libraries to find Zero-Shot Object Detection models and implementations

Most implemented papers

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

idea-research/groundingdino 9 Mar 2023

To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion.

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

computer-vision-in-the-wild/cvinw_readings 19 Apr 2022

In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets and tasks.

Learning Open-World Object Proposals without Learning to Classify

mcahny/object_localization_network 15 Aug 2021

In this paper, we identify that the problem is that the binary classifiers in existing proposal methods tend to overfit to the training categories.

Zero-Shot Instance Segmentation

zhengye1995/Zero-shot-Instance-Segmentation CVPR 2021

We follow this motivation and propose a new task set named zero-shot instance segmentation (ZSI).

Open-vocabulary Object Detection via Vision and Language Knowledge Distillation

tensorflow/tpu ICLR 2022

On COCO, ViLD outperforms the previous state-of-the-art by 4. 8 on novel AP and 11. 4 on overall AP.

Polarity Loss for Zero-shot Object Detection

KennithLi/Awesome-Zero-Shot-Object-Detection 22 Nov 2018

This setting gives rise to the need for correct alignment between visual and semantic concepts, so that the unseen objects can be identified using only their semantic attributes.

Grounded Language-Image Pre-training

microsoft/GLIP CVPR 2022

The unification brings two benefits: 1) it allows GLIP to learn from both detection and grounding data to improve both tasks and bootstrap a good grounding model; 2) GLIP can leverage massive image-text pairs by generating grounding boxes in a self-training fashion, making the learned representation semantic-rich.

Scaling Open-Vocabulary Object Detection

google-research/scenic NeurIPS 2023

However, with OWL-ST, we can scale to over 1B examples, yielding further large improvement: With an L/14 architecture, OWL-ST improves AP on LVIS rare classes, for which the model has seen no human box annotations, from 31. 2% to 44. 6% (43% relative improvement).

YOLO-World: Real-Time Open-Vocabulary Object Detection

ailab-cvc/yolo-world CVPR 2024

The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools.

Zero-Shot Object Detection by Hybrid Region Embedding

KennithLi/Awesome-Zero-Shot-Object-Detection 16 May 2018

Object detection is considered as one of the most challenging problems in computer vision, since it requires correct prediction of both classes and locations of objects in images.