Image Retrieval

668 papers with code • 54 benchmarks • 75 datasets

Image Retrieval is a fundamental and long-standing computer vision task that involves finding images similar to a provided query from a large database. It's often considered as a form of fine-grained, instance-level classification. Not just integral to image recognition alongside classification and detection, it also holds substantial business value by helping users discover images aligning with their interests or requirements, guided by visual similarity or other parameters.

( Image credit: DELF )

Benchmarks

Add a Result

These leaderboards are used to track progress in Image Retrieval

Dataset	Best Model	Compare
ROxford (Medium)	Hypergraph propagation+Community selection	See all
RParis (Medium)	Hypergraph propagation	See all
ROxford (Hard)	SuperGlobal	See all
RParis (Hard)	SuperGlobal	See all
CREPE (Compositional REPresentation Evaluation)	ViT-L-14 (LAION400M)	See all
Flickr30K 1K test	X-VLM (base)	See all
Fashion IQ	SPRC	See all
SOP	Unicom+ViT-L@336px	See all
Oxf5k	Offline Diffusion	See all
Flickr30k-CN	InternVL-G-FT	See all
CIRR	SPRC	See all
iNaturalist	Unicom+ViT-L@336px	See all
Oxf105k	Offline Diffusion	See all
MUGE Retrieval	CN-CLIP (ViT-H/14)	See all
COCO-CN	CN-CLIP (ViT-H/14)	See all
CUB-200-2011	CGD (MG/SG)	See all
CARS196	CGD (MG/SG)	See all
Par6k	Offline Diffusion	See all
Par106k	Offline Diffusion	See all
In-Shop	CGD (SG/GS)	See all
Flickr30k	BLIP-2 ViT-G (zero-shot, 1K test set)	See all
MS COCO	BLIP-2 ViT-G (fine-tuned)	See all
AmsterTime	DINOv2 distilled (ViT-L/14 frozen)	See all
PhotoChat	PaCE	See all
ConQA Descriptive	CLIP	See all
ConQA Conceptual	CLIP	See all
DeepFashion - Consumer-to-shop	CTL Model (ResNet50-IBN-A, 320x320)	See all
Exact Street2Shop	CTL Model (ResNet50-IBN-A, 320x320)	See all
LaSCo	CASE	See all
DeepPatent	SwinV2	See all
24/7 Tokyo	HED-N-GAN	See all
street2shop - topwear	Ranknet	See all
INRIA Holidays	MultiGrain R50 @ 800	See all
Paris6k	IME layer	See all
Oxford5k	GNN-Reranking	See all
AIC-ICC	ERNIE-ViL2.0	See all
WIT	WIT-ALL	See all
CBVS	UniCLP	See all
NUS-WIDE	LESA	See all
DeepFashion	STIR	See all
Google Landmarks Dataset v2 (retrieval, testing)	ResNet101+ArcFace GLDv2-train-clean	See all
Google Landmarks Dataset v2 (retrieval, validation)	ResNet101+ArcFace GLDv2-train-clean	See all
INSTRE	IME layer	See all
CIFAR-10	Custom: 3 conv + 2 fcn	See all
ImageCoDe	ContextualCLIP	See all
PKU-Reid	IHDA	See all
PKU SketchRe-ID Dataset	IHDA	See all
FETA Car-Manuals	FETA's CLIP-MIL (Many-Shot Image-to-text)	See all
FooDI-ML (Global)	ADAPT-I2T	See all
FooDI-ML (Spain)	ADAPT-I2T	See all
Localized Narratives	OPT	See all
ICFG-PEDES	SSAN	See all
RUC-CAS-WenLan	CMCL	See all
ROxford Medium without fine-tuning	HesAff–rSIFT–VLAD	See all

Show all 54 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Image Retrieval models and implementations

huggingface/transformers

4 papers

125,629

OML-Team/open-metric-learning

4 papers

766

kornia/kornia

2 papers

9,439

salesforce/lavis

2 papers

8,804

See all 10 libraries.

Datasets

Subtasks

Medical Image Retrieval

Multi-Label Image Retrieval

Face Image Retrieval

Video-to-Shop

Image Instance Retrieval

Semi-Supervised Sketch Based Image Retrieval

Chat-based Image Retrieval

Latest papers with no code

Most implemented Social Latest No code

Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval

no code yet • 1 May 2024

However, we conjecture that this approach has a downside: the projection module distorts the original image representation and confines the resulting composed embeddings to the text-side.

Paper
Add Code

Large Language Model Informed Patent Image Retrieval

no code yet • 30 Apr 2024

In patent prosecution, image-based retrieval systems for identifying similarities between current patent images and prior art are pivotal to ensure the novelty and non-obviousness of patent applications.

Paper
Add Code

Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models

no code yet • 29 Apr 2024

Our contributions encompass the development of an innovative interactive image retrieval system, the integration of an LLM-based denoiser, the curation of a meticulously designed evaluation dataset, and thorough experimental validation.

Paper
Add Code

Dual-Modal Prompting for Sketch-Based Image Retrieval

no code yet • 29 Apr 2024

In this study, we aim to tackle two major challenges of this task simultaneously: i) zero-shot, dealing with unseen categories, and ii) fine-grained, referring to intra-category instance-level retrieval.

Paper
Add Code

Learning text-to-video retrieval from image captioning

no code yet • 26 Apr 2024

In this paper, we make use of this progress and instantiate the image experts from two types of models: a text-to-image retrieval model to provide an initial backbone, and image captioning models to provide supervision signal into unlabeled videos.

Paper
Add Code

Revisiting Relevance Feedback for CLIP-based Interactive Image Retrieval

no code yet • 25 Apr 2024

However, metric learning cannot handle differences in users' preferences, and requires data to train an image encoder.

Paper
Add Code

CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching

no code yet • 25 Apr 2024

Moreover, all existing methods match crime-scene shoeprints to clean reference prints, yet our analysis shows matching to more informative tread depth maps yields better retrieval results.

Paper
Add Code

DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines

no code yet • 24 Apr 2024

This paper presents a meticulous analysis leading to the proposal of practical guidelines to identify subcategory-specific discrepancies and generate discriminative features to design effective FGIR models.

Paper
Add Code

Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval

no code yet • 23 Apr 2024

Composed Image Retrieval (CIR) is a task that retrieves images similar to a query, based on a provided textual modification.

Paper
Add Code

Collaborative Visual Place Recognition through Federated Learning

no code yet • 20 Apr 2024

Visual Place Recognition (VPR) aims to estimate the location of an image by treating it as a retrieval problem.

Paper
Add Code

Image Retrieval

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result