Open Vocabulary Attribute Detection

12 papers with code • 2 benchmarks • 1 datasets

Open-Vocabulary Attribute Detection (OVAD) is a task that aims to detect and recognize an open set of objects and their associated attributes in an image. The objects and attributes are defined by text queries during inference, without prior knowledge of the tested classes during training.

Libraries

Use these libraries to find Open Vocabulary Attribute Detection models and implementations
3 papers
9,084
2 papers
3,049
See all 5 libraries.

Most implemented papers

Learning Transferable Visual Models From Natural Language Supervision

openai/CLIP 26 Feb 2021

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories.

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

salesforce/lavis 30 Jan 2023

The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models.

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

salesforce/lavis 28 Jan 2022

Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision.

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

salesforce/lavis NeurIPS 2021

Most existing methods employ a transformer-based multimodal encoder to jointly model visual tokens (region-based image features) and word tokens.

Reproducible scaling laws for contrastive language-image learning

laion-ai/scaling-laws-openclip CVPR 2023

To address these limitations, we investigate scaling laws for contrastive language-image pre-training (CLIP) with the public LAION dataset and the open-source OpenCLIP repository.

Open-Vocabulary Object Detection Using Captions

alirezazareian/ovr-cnn CVPR 2021

Weakly supervised and zero-shot learning techniques have been explored to scale object detectors to more categories with less supervision, but they have not been as successful and widely adopted as supervised models.

Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts

zengyan-97/x-vlm 16 Nov 2021

Most existing methods in vision language pre-training rely on object-centric features extracted through object detection and make fine-grained alignments between the extracted features and texts.

Localized Vision-Language Matching for Open-vocabulary Object Detection

lmb-freiburg/locov 12 May 2022

In this work, we propose an open-vocabulary object detection method that, based on image-caption pairs, learns to detect novel object classes along with a given set of known classes.

Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection

mmaaz60/mvits_for_class_agnostic_od 7 Jul 2022

Two popular forms of weak-supervision used in open-vocabulary detection (OVD) include pretrained CLIP model and image-level supervision.

Open-vocabulary Attribute Detection

OVAD-Benchmark/ovad-bechmark-code CVPR 2023

The objective of the novel task and benchmark is to probe object-level attribute information learned by vision-language models.