no code implementations • 20 Jan 2020 • Moshiur R. Farazi, Salman H. Khan, Nick Barnes
However, modelling the visual and semantic features in a high dimensional (joint embedding) space is computationally expensive, and more complex models often result in trivial improvements in the VQA accuracy.
no code implementations • 9 Aug 2019 • Moshiur R. Farazi, Salman H. Khan, Nick Barnes
Visual Question Answering (VQA) models employ attention mechanisms to discover image locations that are most relevant for answering a specific question.
no code implementations • 30 Nov 2018 • Moshiur R. Farazi, Salman H. Khan, Nick Barnes
To evaluate our model, we propose a new split for VQA, separating Unknown visual and semantic concepts from the training set.
no code implementations • 11 May 2018 • Moshiur R. Farazi, Salman H. Khan
Existing attention mechanisms either attend to local image grid or object level features for Visual Question Answering (VQA).