Visual Dialog

49 papers with code • 9 benchmarks • 10 datasets

Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.

Libraries

Use these libraries to find Visual Dialog models and implementations

Most implemented papers

Visual Dialog

batra-mlp-lab/visdial-amt-chat CVPR 2017

We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content.

Hierarchical Question-Image Co-Attention for Visual Question Answering

jiasenlu/HieCoAttenVQA NeurIPS 2016

In addition, our model reasons about the question (and consequently the image via the co-attention mechanism) in a hierarchical fashion via a novel 1-dimensional convolution neural networks (CNN).

Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning

batra-mlp-lab/visdial-rl ICCV 2017

Specifically, we pose a cooperative 'image guessing' game between two agents -- Qbot and Abot -- who communicate in natural language dialog so that Qbot can select an unseen image from a lineup of images.

Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7

hudaAlamri/DSTC7-Audio-Visual-Scene-Aware-Dialog-AVSD-Challenge 1 Jun 2018

Scene-aware dialog systems will be able to have conversations with users about the objects and events around them.

Visual Dialogue without Vision or Dialogue

danielamassiceti/CCA-visualdialogue 16 Dec 2018

We characterise some of the quirks and shortcomings in the exploration of Visual Dialogue - a sequential question-answering task where the questions and corresponding answers are related through given visual stimuli.

Dual Attention Networks for Visual Reference Resolution in Visual Dialog

gicheonkang/DAN-VisDial IJCNLP 2019

Specifically, REFER module learns latent relationships between a given question and a dialog history by employing a self-attention mechanism.

Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline

vmurahari3/visdial-bert ECCV 2020

Next, we find that additional finetuning using "dense" annotations in VisDial leads to even higher NDCG -- more than 10% over our base model -- but hurts MRR -- more than 17% below our base model!

History for Visual Dialog: Do we really need it?

shubhamagarwal92/visdial_conv ACL 2020

Visual Dialog involves "understanding" the dialog history (what has been discussed previously) and the current question (what is asked), in addition to grounding information in the image, to generate the correct response.

Where Are You? Localization from Embodied Dialog

meera1hahn/Graph_LED EMNLP 2020

In this paper, we focus on the LED task -- providing a strong baseline model with detailed ablations characterizing both dataset biases and the importance of various modeling choices.

Learning to Reason: End-to-End Module Networks for Visual Question Answering

ronghanghu/n2nmn ICCV 2017

Natural language questions are inherently compositional, and many are most easily answered by reasoning about their decomposition into modular sub-problems.