Visual Dialog

54 papers with code • 8 benchmarks • 10 datasets

Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.

Libraries

Use these libraries to find Visual Dialog models and implementations

Most implemented papers

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

dvlab-research/minigemini 27 Mar 2024

We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i. e., high-resolution visual tokens, high-quality data, and VLM-guided generation.

Learning to Reason: End-to-End Module Networks for Visual Question Answering

ronghanghu/n2nmn ICCV 2017

Natural language questions are inherently compositional, and many are most easily answered by reasoning about their decomposition into modular sub-problems.

Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model

jiasenlu/visDial.pytorch NeurIPS 2017

In contrast, discriminative dialog models (D) that are trained to rank a list of candidate human responses outperform their generative counterparts; in terms of automatic metrics, diversity, and informativeness of the responses.

Examining Cooperation in Visual Dialog Models

danakianfar/Examining-Cooperation-in-VDM 4 Dec 2017

In this work we propose a blackbox intervention method for visual dialog models, with the aim of assessing the contribution of individual linguistic or visual components.

Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog

naver/aqm-plus NeurIPS 2018

Goal-oriented dialogue tasks occur when a questioner asks an action-oriented question and an answerer responds with the intent of letting the questioner know a correct action to take.

Dialog-based Interactive Image Retrieval

XiaoxiaoGuo/fashion-retrieval NeurIPS 2018

Experiments on both simulated and real-world data show that 1) our proposed learning framework achieves better accuracy than other supervised and reinforcement learning baselines and 2) user feedback based on natural language rather than pre-specified attributes leads to more effective retrieval results, and a more natural and expressive communication interface.

Ask No More: Deciding when to guess in referential visual dialogue

shekharRavi/ask-no-more-COLING2018 COLING 2018

We make initial steps towards this general goal by augmenting a task-oriented visual dialogue model with a decision-making component that decides whether to ask a follow-up question to identify a target referent in an image, or to stop the conversation to make a guess.

Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient

ruizhaogit/GuessWhat-TemperedPolicyGradient 2 Jul 2018

Learning goal-oriented dialogues by means of deep reinforcement learning has recently become a popular research topic.

Visual Reasoning with Multi-hop Feature Modulation

GuessWhatGame/referit ECCV 2018

Recent breakthroughs in computer vision and natural language processing have spurred interest in challenging multi-modal tasks such as visual question-answering and visual dialogue.

Visual Coreference Resolution in Visual Dialog using Neural Module Networks

facebookresearch/corefnmn ECCV 2018

Visual dialog entails answering a series of questions grounded in an image, using dialog history as context.