Image Retrieval with Multi-Modal Query
9 papers with code • 3 benchmarks • 2 datasets
The problem of retrieving images from a database based on a multi-modal (image- text) query. Specifically, the query text prompts some modification in the query image and the task is to retrieve images with the desired modifications.
Subtasks
Latest papers
Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization
The key idea underpinning the proposed method is to integrate fine- and coarse-grained retrieval as matching data points with small and large fluctuations, respectively.
Compositional Learning of Image-Text Query for Image Retrieval
In this paper, we investigate the problem of retrieving images from a database based on a multi-modal (image-text) query.
Composing Text and Image for Image Retrieval - An Empirical Odyssey
In this paper, we study the task of image retrieval, where the input query is specified in the form of an image plus some text that describes desired modifications to the input image.
Attributes as Operators: Factorizing Unseen Attribute-Object Compositions
In addition, we show that not only can our model recognize unseen compositions robustly in an open-world setting, it can also generalize to compositions where objects themselves were unseen during training.
FiLM: Visual Reasoning with a General Conditioning Layer
We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation.
Automatic Spatially-aware Fashion Concept Discovery
This paper proposes an automatic spatially-aware concept discovery approach using weakly labeled image-text data from shopping websites.
A simple neural network module for relational reasoning
Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for neural networks to learn.
Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction
We tackle image question answering (ImageQA) problem by learning a convolutional neural network (CNN) with a dynamic parameter layer whose weights are determined adaptively based on questions.
Show and Tell: A Neural Image Caption Generator
Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions.