A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input

NeurIPS 2014  ·  Mateusz Malinowski, Mario Fritz ·

We propose a method for automatically answering questions about images by bringing together recent advances from natural language processing and computer vision. We combine discrete reasoning with uncertain predictions by a multi-world approach that represents uncertainty about the perceived world in a bayesian framework. Our approach can handle human questions of high complexity about realistic scenes and replies with range of answer like counts, object classes, instances and lists of them. The system is directly trained from question-answer pairs. We establish a first benchmark for this task that can be seen as a modern attempt at a visual turing test.

PDF Abstract NeurIPS 2014 PDF NeurIPS 2014 Abstract
No code implementations yet. Submit your code now

Datasets


Introduced in the Paper:

DAQUAR

Used in the Paper:

NYUv2

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here