Visual Dialog

Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.

Visual Dialog

facebookresearch/ParlAI CVPR 2017

We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content.

Chatbot Visual Dialog

Hierarchical Question-Image Co-Attention for Visual Question Answering

jiasenlu/HieCoAttenVQA NeurIPS 2016

In addition, our model reasons about the question (and consequently the image via the co-attention mechanism) in a hierarchical fashion via a novel 1-dimensional convolution neural networks (CNN).

Visual Dialog Visual Question Answering

Learning to Reason: End-to-End Module Networks for Visual Question Answering

ronghanghu/n2nmn ICCV 2017

ronghanghu/n2nmn ICCV 2017

Natural language questions are inherently compositional, and many are most easily answered by reasoning about their decomposition into modular sub-problems.

Visual Dialog Visual Question Answering

Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning

batra-mlp-lab/visdial-rl ICCV 2017

batra-mlp-lab/visdial-rl ICCV 2017

Specifically, we pose a cooperative 'image guessing' game between two agents -- Qbot and Abot -- who communicate in natural language dialog so that Qbot can select an unseen image from a lineup of images.

Visual Dialog Visual Question Answering

Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model

jiasenlu/visDial.pytorch NeurIPS 2017

jiasenlu/visDial.pytorch NeurIPS 2017

In contrast, discriminative dialog models (D) that are trained to rank a list of candidate human responses outperform their generative counterparts; in terms of automatic metrics, diversity, and informativeness of the responses.

Metric Learning Transfer Learning +1

Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline

vmurahari3/visdial-bert ECCV 2020

vmurahari3/visdial-bert ECCV 2020

Next, we find that additional finetuning using "dense" annotations in VisDial leads to even higher NDCG -- more than 10% over our base model -- but hurts MRR -- more than 17% below our base model!

Language Modelling Representation Learning +1

DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue

JXZe/DualVD 17 Nov 2019

JXZe/DualVD 17 Nov 2019

More importantly, we can tell which modality (visual or semantic) has more contribution in answering the current question by visualizing the gate values.

Feature Selection Question Answering +2

Recursive Visual Attention in Visual Dialog

yuleiniu/rva CVPR 2019

Visual dialog is a challenging vision-language task, which requires the agent to answer multi-round questions about an image.

Question Answering Visual Dialog +1

Dialog-based Interactive Image Retrieval

XiaoxiaoGuo/fashion-retrieval NeurIPS 2018

Experiments on both simulated and real-world data show that 1) our proposed learning framework achieves better accuracy than other supervised and reinforcement learning baselines and 2) user feedback based on natural language rather than pre-specified attributes leads to more effective retrieval results, and a more natural and expressive communication interface.

Image Retrieval Visual Dialog