5 papers with code • 0 benchmarks • 0 datasets
These leaderboards are used to track progress in Network Interpretation
We ask whether the neural network interpretation methods can be fooled via adversarial model manipulation, which is defined as a model fine-tuning step that aims to radically alter the explanations without hurting the accuracy of the original models, e. g., VGG19, ResNet50, and DenseNet121.
We demonstrate that training the networks to have interpretable gradients improves their robustness to adversarial perturbations.
Recent works have empirically shown that there exist adversarial examples that can be hidden from neural network interpretability (namely, making network interpretation maps visually similar), or interpretability is itself susceptible to adversarial attacks.
Neural networks embedded in safety-sensitive applications such as self-driving cars and wearable health monitors rely on two important techniques: input attribution for hindsight analysis and network compression to reduce its size for edge-computing.
Visual question answering (VQA) is a hallmark of vision and language reasoning and a challenging task under the zero-shot setting.
However, the lack of considering the normalization of the attributions, which is essential in their visualizations, has been an obstacle to understanding and improving the robustness of feature attribution methods.