Visual Dialog

54 papers with code • 8 benchmarks • 10 datasets

Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.

Benchmarks

Add a Result

These leaderboards are used to track progress in Visual Dialog

Dataset	Best Model	Compare
VisDial v0.9 val	9xFGA (VGG)	See all
Visual Dialog v1.0 test-std	Single	See all
VisDial v1.0 test-std	5xFGA + LS*+	See all
ConvAI2	Multi-Modal BlenderBot	See all
EmpatheticDialogues	Multi-Modal BlenderBot	See all
Wizard of Wikipedia	Multi-Modal BlenderBot	See all
BlendedSkillTalk	Multi-Modal BlenderBot	See all
Image-Chat	Multi-Modal BlenderBot	See all

Libraries

Use these libraries to find Visual Dialog models and implementations

naver/aqm-plus

3 papers

kdexd/lang-emerge-parlai

2 papers

105

zihaow123/unimm

2 papers

Datasets

Most implemented papers

Most implemented Social Latest No code

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

dvlab-research/minigemini • • 27 Mar 2024

We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i. e., high-resolution visual tokens, high-quality data, and VLM-guided generation.

Paper
Code

Learning to Reason: End-to-End Module Networks for Visual Question Answering

ronghanghu/n2nmn • • ICCV 2017

Natural language questions are inherently compositional, and many are most easily answered by reasoning about their decomposition into modular sub-problems.

Paper
Code

Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model

jiasenlu/visDial.pytorch • • NeurIPS 2017

In contrast, discriminative dialog models (D) that are trained to rank a list of candidate human responses outperform their generative counterparts; in terms of automatic metrics, diversity, and informativeness of the responses.

Paper
Code

Examining Cooperation in Visual Dialog Models

danakianfar/Examining-Cooperation-in-VDM • • 4 Dec 2017

In this work we propose a blackbox intervention method for visual dialog models, with the aim of assessing the contribution of individual linguistic or visual components.

Paper
Code

Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog

naver/aqm-plus • • NeurIPS 2018

Goal-oriented dialogue tasks occur when a questioner asks an action-oriented question and an answerer responds with the intent of letting the questioner know a correct action to take.

Paper
Code

Dialog-based Interactive Image Retrieval

XiaoxiaoGuo/fashion-retrieval • • NeurIPS 2018

Experiments on both simulated and real-world data show that 1) our proposed learning framework achieves better accuracy than other supervised and reinforcement learning baselines and 2) user feedback based on natural language rather than pre-specified attributes leads to more effective retrieval results, and a more natural and expressive communication interface.

Paper
Code

Ask No More: Deciding when to guess in referential visual dialogue

shekharRavi/ask-no-more-COLING2018 • • COLING 2018

We make initial steps towards this general goal by augmenting a task-oriented visual dialogue model with a decision-making component that decides whether to ask a follow-up question to identify a target referent in an image, or to stop the conversation to make a guess.

Paper
Code

Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient

ruizhaogit/GuessWhat-TemperedPolicyGradient • • 2 Jul 2018

Learning goal-oriented dialogues by means of deep reinforcement learning has recently become a popular research topic.

Paper
Code

Visual Reasoning with Multi-hop Feature Modulation

GuessWhatGame/referit • • ECCV 2018

Recent breakthroughs in computer vision and natural language processing have spurred interest in challenging multi-modal tasks such as visual question-answering and visual dialogue.

Paper
Code

Visual Coreference Resolution in Visual Dialog using Neural Module Networks

facebookresearch/corefnmn • • ECCV 2018

Visual dialog entails answering a series of questions grounded in an image, using dialog history as context.

Paper
Code

Visual Dialog

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result