1 code implementation • 27 Jul 2020 • Siwen Luo, Soyeon Caren Han, Kaiyuan Sun, Josiah Poon
Visual question answering (VQA) is a challenging multi-modal task that requires not only the semantic understanding of both images and questions, but also the sound perception of a step-by-step reasoning process that would lead to the correct answer.