Image Retrieval

649 papers with code • 52 benchmarks • 74 datasets

Image Retrieval is a fundamental and long-standing computer vision task that involves finding images similar to a provided query from a large database. It's often considered as a form of fine-grained, instance-level classification. Not just integral to image recognition alongside classification and detection, it also holds substantial business value by helping users discover images aligning with their interests or requirements, guided by visual similarity or other parameters.

( Image credit: DELF )

Libraries

Use these libraries to find Image Retrieval models and implementations
2 papers
9,202
2 papers
8,400
See all 6 libraries.

Most implemented papers

VGGFace2: A dataset for recognising faces across pose and age

deepinsight/insightface 23 Oct 2017

The dataset was collected with three goals in mind: (i) to have both a large number of identities and also a large number of images for each identity; (ii) to cover a large range of pose, age and ethnicity; and (iii) to minimize the label noise.

NetVLAD: CNN architecture for weakly supervised place recognition

Relja/netvlad CVPR 2016

We tackle the problem of large scale visual place recognition, where the task is to quickly and accurately recognize the location of a given query photograph.

Fine-tuning CNN Image Retrieval with No Human Annotation

filipradenovic/cnnimageretrieval-pytorch 3 Nov 2017

We show that both hard-positive and hard-negative examples, selected by exploiting the geometry and the camera positions available from the 3D models, enhance the performance of particular-object retrieval.

Large-Scale Image Retrieval with Attentive Deep Local Features

tensorflow/models ICCV 2017

We propose an attentive local feature descriptor suitable for large-scale image retrieval, referred to as DELF (DEep Local Feature).

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

salesforce/lavis 30 Jan 2023

The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models.

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

facebookresearch/vilbert-multi-task NeurIPS 2019

We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language.

Circle Loss: A Unified Perspective of Pair Similarity Optimization

layumi/Person_reID_baseline_pytorch CVPR 2020

This paper provides a pair similarity optimization viewpoint on deep feature learning, aiming to maximize the within-class similarity $s_p$ and minimize the between-class similarity $s_n$.

VSE++: Improving Visual-Semantic Embeddings with Hard Negatives

fartashf/vsepp 18 Jul 2017

We present a new technique for learning visual-semantic embeddings for cross-modal retrieval.

Learning Deep Representations of Fine-grained Visual Descriptions

hanzhanggit/StackGAN-v2 CVPR 2016

State-of-the-art methods for zero-shot visual recognition formulate learning as a joint embedding problem of images and side information.

Looking at Outfit to Parse Clothing

kyamagu/js-segment-annotator 4 Mar 2017

This paper extends fully-convolutional neural networks (FCN) for the clothing parsing problem.