In this paper, we study the problem of geometric reasoning in the context of
question-answering. We introduce Dynamic Spatial Memory Network (DSMN), a new
deep network architecture designed for answering questions that admit latent
visual representations. DSMN learns to generate and reason over such
representations. Further, we propose two synthetic benchmarks, FloorPlanQA and
ShapeIntersection, to evaluate the geometric reasoning capability of QA
systems. Experimental results validate the effectiveness of our proposed DSMN
for visual thinking tasks.