This work proposes a new method that utilizes semantically related questions, referred to as basic questions, acting as noise to evaluate the robustness of VQA models.
This has incited research on the reaction of DNNs to noisy input, namely developing adversarial input attacks and strategies that lead to robust DNNs to these attacks.
In this work, we propose a new training regularizer that aims to minimize the probabilistic expected training loss of a DNN subject to a generic Gaussian input.
In this work, we propose a new method that uses semantically related questions, dubbed basic questions, acting as noise to evaluate the robustness of VQA models.
Despite the impressive performance of deep neural networks (DNNs) on numerous vision tasks, they still exhibit yet-to-understand uncouth behaviours.
Moreover, we show how these expressions can be used to systematically construct targeted and non-targeted adversarial attacks.
In VQA, adversarial attacks can target the image and/or the proposed main question and yet there is a lack of proper analysis of the later.
Visual Question Answering (VQA) models should have both high robustness and accuracy.
Given a natural language question about an image, the first module takes the question as input and then outputs the basic questions of the main given question.