About

Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.

Benchmarks

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Datasets

Greatest papers with code

Learning to Reason: End-to-End Module Networks for Visual Question Answering

ICCV 2017 tensorflow/models

Natural language questions are inherently compositional, and many are most easily answered by reasoning about their decomposition into modular sub-problems.

VISUAL DIALOG VISUAL QUESTION ANSWERING

Visual Dialog

CVPR 2017 facebookresearch/ParlAI

We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content.

CHATBOT VISUAL DIALOG

Hierarchical Question-Image Co-Attention for Visual Question Answering

NeurIPS 2016 jiasenlu/HieCoAttenVQA

In addition, our model reasons about the question (and consequently the image via the co-attention mechanism) in a hierarchical fashion via a novel 1-dimensional convolution neural networks (CNN).

VISUAL DIALOG VISUAL QUESTION ANSWERING

Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning

ICCV 2017 batra-mlp-lab/visdial-rl

Specifically, we pose a cooperative 'image guessing' game between two agents -- Qbot and Abot -- who communicate in natural language dialog so that Qbot can select an unseen image from a lineup of images.

VISUAL DIALOG VISUAL QUESTION ANSWERING

Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model

NeurIPS 2017 jiasenlu/visDial.pytorch

In contrast, discriminative dialog models (D) that are trained to rank a list of candidate human responses outperform their generative counterparts; in terms of automatic metrics, diversity, and informativeness of the responses.

METRIC LEARNING TRANSFER LEARNING VISUAL DIALOG

Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline

ECCV 2020 vmurahari3/visdial-bert

Next, we find that additional finetuning using "dense" annotations in VisDial leads to even higher NDCG -- more than 10% over our base model -- but hurts MRR -- more than 17% below our base model!

LANGUAGE MODELLING REPRESENTATION LEARNING VISUAL DIALOG

DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue

17 Nov 2019JXZe/DualVD

More importantly, we can tell which modality (visual or semantic) has more contribution in answering the current question by visualizing the gate values.

FEATURE SELECTION QUESTION ANSWERING VISUAL DIALOG VISUAL QUESTION ANSWERING

Recursive Visual Attention in Visual Dialog

CVPR 2019 yuleiniu/rva

Visual dialog is a challenging vision-language task, which requires the agent to answer multi-round questions about an image.

QUESTION ANSWERING VISUAL DIALOG VISUAL QUESTION ANSWERING

Dialog-based Interactive Image Retrieval

NeurIPS 2018 XiaoxiaoGuo/fashion-retrieval

Experiments on both simulated and real-world data show that 1) our proposed learning framework achieves better accuracy than other supervised and reinforcement learning baselines and 2) user feedback based on natural language rather than pre-specified attributes leads to more effective retrieval results, and a more natural and expressive communication interface.

IMAGE RETRIEVAL VISUAL DIALOG