About

The problem of retrieving images from a database based on a multi-modal (image- text) query. Specifically, the query text prompts some modification in the query image and the task is to retrieve images with the desired modifications.

Benchmarks

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Subtasks

Greatest papers with code

Show and Tell: A Neural Image Caption Generator

CVPR 2015 karpathy/neuraltalk

Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions.

IMAGE CAPTIONING IMAGE RETRIEVAL WITH MULTI-MODAL QUERY TEXT GENERATION

Composing Text and Image for Image Retrieval - An Empirical Odyssey

CVPR 2019 google/tirg

In this paper, we study the task of image retrieval, where the input query is specified in the form of an image plus some text that describes desired modifications to the input image.

IMAGE RETRIEVAL IMAGE RETRIEVAL WITH MULTI-MODAL QUERY

FiLM: Visual Reasoning with a General Conditioning Layer

22 Sep 2017ethanjperez/film

We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation.

IMAGE RETRIEVAL WITH MULTI-MODAL QUERY VISUAL QUESTION ANSWERING VISUAL REASONING

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

CVPR 2016 HyeonwooNoh/DPPnet

We tackle image question answering (ImageQA) problem by learning a convolutional neural network (CNN) with a dynamic parameter layer whose weights are determined adaptively based on questions.

IMAGE RETRIEVAL WITH MULTI-MODAL QUERY QUESTION ANSWERING VISUAL QUESTION ANSWERING

Attributes as Operators: Factorizing Unseen Attribute-Object Compositions

ECCV 2018 Tushar-N/attributes-as-operators

In addition, we show that not only can our model recognize unseen compositions robustly in an open-world setting, it can also generalize to compositions where objects themselves were unseen during training.

COMPOSITIONAL ZERO-SHOT LEARNING IMAGE RETRIEVAL WITH MULTI-MODAL QUERY

Compositional Learning of Image-Text Query for Image Retrieval

19 Jun 2020ecom-research/ComposeAE

In this paper, we investigate the problem of retrieving images from a database based on a multi-modal (image-text) query.

IMAGE RETRIEVAL IMAGE RETRIEVAL WITH MULTI-MODAL QUERY METRIC LEARNING