Image Retrieval with Multi-Modal Query
9 papers with code • 3 benchmarks • 2 datasets
The problem of retrieving images from a database based on a multi-modal (image- text) query. Specifically, the query text prompts some modification in the query image and the task is to retrieve images with the desired modifications.
Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for neural networks to learn.
In this paper, we study the task of image retrieval, where the input query is specified in the form of an image plus some text that describes desired modifications to the input image.
We tackle image question answering (ImageQA) problem by learning a convolutional neural network (CNN) with a dynamic parameter layer whose weights are determined adaptively based on questions.
This paper proposes an automatic spatially-aware concept discovery approach using weakly labeled image-text data from shopping websites.
In addition, we show that not only can our model recognize unseen compositions robustly in an open-world setting, it can also generalize to compositions where objects themselves were unseen during training.
The key idea underpinning the proposed method is to integrate fine- and coarse-grained retrieval as matching data points with small and large fluctuations, respectively.