Image Retrieval with Multi-Modal Query

9 papers with code • 3 benchmarks • 2 datasets

The problem of retrieving images from a database based on a multi-modal (image- text) query. Specifically, the query text prompts some modification in the query image and the task is to retrieve images with the desired modifications.

Most implemented papers

Show and Tell: A Neural Image Caption Generator

yashk2810/Image-Captioning CVPR 2015

Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions.

A simple neural network module for relational reasoning

kimhc6028/relational-networks NeurIPS 2017

Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for neural networks to learn.

FiLM: Visual Reasoning with a General Conditioning Layer

ethanjperez/film 22 Sep 2017

We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation.

Composing Text and Image for Image Retrieval - An Empirical Odyssey

google/tirg CVPR 2019

In this paper, we study the task of image retrieval, where the input query is specified in the form of an image plus some text that describes desired modifications to the input image.

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

HyeonwooNoh/DPPnet CVPR 2016

We tackle image question answering (ImageQA) problem by learning a convolutional neural network (CNN) with a dynamic parameter layer whose weights are determined adaptively based on questions.

Automatic Spatially-aware Fashion Concept Discovery

naver/artemis ICCV 2017

This paper proposes an automatic spatially-aware concept discovery approach using weakly labeled image-text data from shopping websites.

Attributes as Operators: Factorizing Unseen Attribute-Object Compositions

Tushar-N/attributes-as-operators ECCV 2018

In addition, we show that not only can our model recognize unseen compositions robustly in an open-world setting, it can also generalize to compositions where objects themselves were unseen during training.

Compositional Learning of Image-Text Query for Image Retrieval

ecom-research/ComposeAE 19 Jun 2020

In this paper, we investigate the problem of retrieving images from a database based on a multi-modal (image-text) query.

Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization

Monoxide-Chen/uncertainty_retrieval 14 Nov 2022

The key idea underpinning the proposed method is to integrate fine- and coarse-grained retrieval as matching data points with small and large fluctuations, respectively.