TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Image Retrieval	CIRR	Candidate Set Re-ranking	(Recall@5+Recall_subset@1)/2	80.9	# 2
Image Retrieval	Fashion IQ	Candidate Set Re-ranking	(Recall@10+Recall@50)/2	62.15	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/candidate-set-re-ranking-for-composed-image/image-retrieval-on-cirr)](https://paperswithcode.com/sota/image-retrieval-on-cirr?p=candidate-set-re-ranking-for-composed-image)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/candidate-set-re-ranking-for-composed-image/image-retrieval-on-fashion-iq)](https://paperswithcode.com/sota/image-retrieval-on-fashion-iq?p=candidate-set-re-ranking-for-composed-image)`

Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder

25 May 2023 · Zheyuan Liu, Weixuan Sun, Damien Teney, Stephen Gould ·

Composed image retrieval aims to find an image that best matches a given multi-modal user query consisting of a reference image and text pair. Existing methods commonly pre-compute image embeddings over the entire corpus and compare these to a reference image embedding modified by the query text at test time. Such a pipeline is very efficient at test time since fast vector distances can be used to evaluate candidates, but modifying the reference image embedding guided only by a short textual description can be difficult, especially independent of potential candidates. An alternative approach is to allow interactions between the query and every possible candidate, i.e., reference-text-candidate triplets, and pick the best from the entire set. Though this approach is more discriminative, for large-scale datasets the computational cost is prohibitive since pre-computation of candidate embeddings is no longer possible. We propose to combine the merits of both schemes using a two-stage model. Our first stage adopts the conventional vector distancing metric and performs a fast pruning among candidates. Meanwhile, our second stage employs a dual-encoder architecture, which effectively attends to the input triplet of reference-text-candidate and re-ranks the candidates. Both stages utilize a vision-and-language pre-trained network, which has proven beneficial for various downstream tasks. Our method consistently outperforms state-of-the-art approaches on standard benchmarks for the task. Our implementation is available at https://github.com/Cuberick-Orion/Candidate-Reranking-CIR.

PDF Abstract

Code

Add Remove Mark official

Cuberick-Orion/Candidate-Reranking-… official

Cuberick-Orion/Bi-Blip4CIR

Tasks

Add Remove

Composed Image Retrieval (CoIR)

Image Retrieval

Re-Ranking

Retrieval

Datasets

Fashion IQ

CIRR

Results from the Paper

Edit

Ranked #2 on Image Retrieval on CIRR

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Image Retrieval	CIRR	Candidate Set Re-ranking	(Recall@5+Recall_subset@1)/2	80.9	# 2		Compare
Image Retrieval	Fashion IQ	Candidate Set Re-ranking	(Recall@10+Recall@50)/2	62.15	# 2		Compare

Methods

Add Remove

Pruning • Test

Edit Social Preview

Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove