The problem of retrieving images from a database based on a multi-modal (image- text) query. Specifically, the query text prompts some modification in the query image and the task is to retrieve images with the desired modifications.
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions.
Ranked #3 on Image Retrieval with Multi-Modal Query on MIT-States
Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for neural networks to learn.
In this paper, we study the task of image retrieval, where the input query is specified in the form of an image plus some text that describes desired modifications to the input image.
Ranked #2 on Image Retrieval with Multi-Modal Query on MIT-States
We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation.
Ranked #3 on Visual Question Answering on CLEVR-Humans
We tackle image question answering (ImageQA) problem by learning a convolutional neural network (CNN) with a dynamic parameter layer whose weights are determined adaptively based on questions.
In addition, we show that not only can our model recognize unseen compositions robustly in an open-world setting, it can also generalize to compositions where objects themselves were unseen during training.
Ranked #5 on Image Retrieval with Multi-Modal Query on MIT-States
In this paper, we investigate the problem of retrieving images from a database based on a multi-modal (image-text) query.
Ranked #1 on Image Retrieval with Multi-Modal Query on MIT-States