Image Retrieval

665 papers with code • 54 benchmarks • 75 datasets

Image Retrieval is a fundamental and long-standing computer vision task that involves finding images similar to a provided query from a large database. It's often considered as a form of fine-grained, instance-level classification. Not just integral to image recognition alongside classification and detection, it also holds substantial business value by helping users discover images aligning with their interests or requirements, guided by visual similarity or other parameters.

( Image credit: DELF )

Benchmarks

Add a Result

These leaderboards are used to track progress in Image Retrieval

Dataset	Best Model	Compare
ROxford (Medium)	Hypergraph propagation+Community selection	See all
RParis (Medium)	Hypergraph propagation	See all
ROxford (Hard)	SuperGlobal	See all
RParis (Hard)	SuperGlobal	See all
CREPE (Compositional REPresentation Evaluation)	ViT-L-14 (LAION400M)	See all
Flickr30K 1K test	X-VLM (base)	See all
Fashion IQ	SPRC	See all
SOP	Unicom+ViT-L@336px	See all
Oxf5k	Offline Diffusion	See all
Flickr30k-CN	InternVL-G-FT	See all
CIRR	SPRC	See all
iNaturalist	Unicom+ViT-L@336px	See all
Oxf105k	Offline Diffusion	See all
MUGE Retrieval	CN-CLIP (ViT-H/14)	See all
COCO-CN	CN-CLIP (ViT-H/14)	See all
CUB-200-2011	CGD (MG/SG)	See all
CARS196	CGD (MG/SG)	See all
Par6k	Offline Diffusion	See all
Par106k	Offline Diffusion	See all
In-Shop	CGD (SG/GS)	See all
Flickr30k	BLIP-2 ViT-G (zero-shot, 1K test set)	See all
MS COCO	BLIP-2 ViT-G (fine-tuned)	See all
AmsterTime	DINOv2 distilled (ViT-L/14 frozen)	See all
PhotoChat	PaCE	See all
ConQA Descriptive	CLIP	See all
ConQA Conceptual	CLIP	See all
DeepFashion - Consumer-to-shop	CTL Model (ResNet50-IBN-A, 320x320)	See all
Exact Street2Shop	CTL Model (ResNet50-IBN-A, 320x320)	See all
LaSCo	CASE	See all
DeepPatent	SwinV2	See all
24/7 Tokyo	HED-N-GAN	See all
street2shop - topwear	Ranknet	See all
INRIA Holidays	MultiGrain R50 @ 800	See all
Paris6k	IME layer	See all
Oxford5k	GNN-Reranking	See all
AIC-ICC	ERNIE-ViL2.0	See all
WIT	WIT-ALL	See all
CBVS	UniCLP	See all
NUS-WIDE	LESA	See all
DeepFashion	STIR	See all
Google Landmarks Dataset v2 (retrieval, testing)	ResNet101+ArcFace GLDv2-train-clean	See all
Google Landmarks Dataset v2 (retrieval, validation)	ResNet101+ArcFace GLDv2-train-clean	See all
INSTRE	IME layer	See all
CIFAR-10	Custom: 3 conv + 2 fcn	See all
ImageCoDe	ContextualCLIP	See all
PKU-Reid	IHDA	See all
PKU SketchRe-ID Dataset	IHDA	See all
FETA Car-Manuals	FETA's CLIP-MIL (Many-Shot Image-to-text)	See all
FooDI-ML (Global)	ADAPT-I2T	See all
FooDI-ML (Spain)	ADAPT-I2T	See all
Localized Narratives	OPT	See all
ICFG-PEDES	SSAN	See all
RUC-CAS-WenLan	CMCL	See all
ROxford Medium without fine-tuning	HesAff–rSIFT–VLAD	See all

Show all 54 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Image Retrieval models and implementations

huggingface/transformers

5 papers

124,527

OML-Team/open-metric-learning

4 papers

757

kornia/kornia

2 papers

9,356

salesforce/lavis

2 papers

8,674

See all 10 libraries.

Datasets

Subtasks

Medical Image Retrieval

Multi-Label Image Retrieval

Face Image Retrieval

Video-to-Shop

Image Instance Retrieval

Semi-Supervised Sketch Based Image Retrieval

Chat-based Image Retrieval

Most implemented papers

Most implemented Social Latest No code

Learning Deep Representations of Fine-grained Visual Descriptions

hanzhanggit/StackGAN-v2 • • CVPR 2016

State-of-the-art methods for zero-shot visual recognition formulate learning as a joint embedding problem of images and side information.

Paper
Code

Looking at Outfit to Parse Clothing

kyamagu/js-segment-annotator • 4 Mar 2017

This paper extends fully-convolutional neural networks (FCN) for the clothing parsing problem.

Paper
Code

Combination of Multiple Global Descriptors for Image Retrieval

naver/cgd • • arXiv 2019

Recent studies in image retrieval task have shown that ensembling different models and combining multiple global descriptors lead to performance improvement.

Paper
Code

Particular object retrieval with integral max-pooling of CNN activations

naver/deep-image-retrieval • • 18 Nov 2015

Recently, image representation built upon Convolutional Neural Network (CNN) has been shown to provide effective descriptors for image search, outperforming pre-CNN features as short-vector representations.

Paper
Code

Stacked Cross Attention for Image-Text Matching

kuanghuei/SCAN • • ECCV 2018

Prior work either simply aggregates the similarity of all possible pairs of regions and words without attending differentially to more and less important words or regions, or uses a multi-step attentional process to capture limited number of semantic alignments which is less interpretable.

Paper
Code

CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples

filipradenovic/cnnimageretrieval-pytorch • • 8 Apr 2016

Convolutional Neural Networks (CNNs) achieve state-of-the-art performance in many computer vision tasks.

Paper
Code

Sampling Matters in Deep Embedding Learning

CompVis/metric-learning-divide-and-conquer • • ICCV 2017

In addition, we show that a simple margin based loss is sufficient to outperform all other loss functions.

Paper
Code

Batch DropBlock Network for Person Re-identification and Beyond

daizuozhuo/batch-feature-erasing-network • • ICCV 2019

In this paper, we propose the Batch DropBlock (BDB) Network which is a two branch network composed of a conventional ResNet-50 as the global branch and a feature dropping branch.

Paper
Code

SoftTriple Loss: Deep Metric Learning Without Triplet Sampling

idstcv/SoftTriple • • ICCV 2019

The set of triplet constraints has to be sampled within the mini-batch.

Paper
Code

12-in-1: Multi-Task Vision and Language Representation Learning

facebookresearch/vilbert-multi-task • • CVPR 2020

Much of vision-and-language research focuses on a small but diverse set of independent tasks and supporting datasets often studied in isolation; however, the visually-grounded language understanding skills required for success at these tasks overlap significantly.

Paper
Code

Image Retrieval

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result