Zero-Shot Image Classification

41 papers with code • 3 benchmarks • 4 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Zero-Shot Image Classification models and implementations

Most implemented papers

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

computer-vision-in-the-wild/cvinw_readings 19 Apr 2022

In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets and tasks.

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

kakaobrain/coyo-dataset 11 Feb 2021

In this paper, we leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps in the Conceptual Captions dataset.

Open-vocabulary Object Detection via Vision and Language Knowledge Distillation

tensorflow/tpu ICLR 2022

On COCO, ViLD outperforms the previous state-of-the-art by 4. 8 on novel AP and 11. 4 on overall AP.

LiT: Zero-Shot Transfer with Locked-image text Tuning

google-research/vision_transformer CVPR 2022

This paper presents contrastive-tuning, a simple method employing contrastive training to align image and text models while still taking advantage of their pre-training.

Reproducible scaling laws for contrastive language-image learning

laion-ai/scaling-laws-openclip CVPR 2023

To address these limitations, we investigate scaling laws for contrastive language-image pre-training (CLIP) with the public LAION dataset and the open-source OpenCLIP repository.

A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model

mendelxu/zsseg.baseline 29 Dec 2021

However, semantic segmentation and the CLIP model perform on different visual granularity, that semantic segmentation processes on pixels while CLIP performs on images.

DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

zjukg/DUET 4 Jul 2022

Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance; (3) proposed a multi-task learning policy for considering multi-model objectives.

What does a platypus look like? Generating customized prompts for zero-shot image classification

sarahpratt/cupl ICCV 2023

Unlike traditional classification models, open-vocabulary models classify among any arbitrary set of categories specified with natural language during inference.

Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning

andron00e/sparsecbm 4 Apr 2024

We propose a novel architecture and method of explainable classification with Concept Bottleneck Models (CBMs).