Visual Dialog

54 papers with code • 8 benchmarks • 10 datasets

Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.

Libraries

Use these libraries to find Visual Dialog models and implementations

Most implemented papers

Recursive Visual Attention in Visual Dialog

yuleiniu/rva CVPR 2019

Visual dialog is a challenging vision-language task, which requires the agent to answer multi-round questions about an image.

Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation

naver/aqm-plus ICLR 2019

Answerer in Questioner's Mind (AQM) is an information-theoretic framework that has been recently proposed for task-oriented dialog systems.

Discourse Parsing in Videos: A Multi-modal Appraoch

arjunakula/Visual-Discourse-Parsing 6 Mar 2019

We propose the task of Visual Discourse Parsing, which requires understanding discourse relations among scenes in a video.

CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog

satwikkottur/clevr-dialog NAACL 2019

Specifically, we construct a dialog grammar that is grounded in the scene graphs of the images from the CLEVR dataset.

Factor Graph Attention

idansc/fga CVPR 2019

We address this issue and develop a general attention mechanism for visual dialog which operates on any number of data utilities.

Reasoning Visual Dialogs with Structural and Partial Observations

zilongzheng/visdial-gnn CVPR 2019

The answer to a given question is represented by a node with missing value.

Improving Generative Visual Dialog by Answering Diverse Questions

vmurahari3/visdial-diversity IJCNLP 2019

Prior work on training generative Visual Dialog models with reinforcement learning(Das et al.) has explored a Qbot-Abot image-guessing game and shown that this 'self-talk' approach can lead to improved performance at the downstream dialog-conditioned image-guessing task.

TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines

Deanplayerljx/tab-vcr NeurIPS 2019

Despite impressive recent progress that has been reported on tasks that necessitate reasoning, such as visual question answering and visual dialog, models often exploit biases in datasets.

DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue

JXZe/DualVD 17 Nov 2019

More importantly, we can tell which modality (visual or semantic) has more contribution in answering the current question by visualizing the gate values.

An Annotated Corpus of Reference Resolution for Interpreting Common Grounding

Alab-NII/onecommon 18 Nov 2019

Common grounding is the process of creating, repairing and updating mutual understandings, which is a fundamental aspect of natural language conversation.