Network Interpretation

5 papers with code • 0 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Fooling Neural Network Interpretations via Adversarial Model Manipulation

rmrisforbidden/Fooling_Neural_Network-Interpretations NeurIPS 2019

We ask whether the neural network interpretation methods can be fooled via adversarial model manipulation, which is defined as a model fine-tuning step that aims to radically alter the explanations without hurting the accuracy of the original models, e. g., VGG19, ResNet50, and DenseNet121.

An Empirical Study on the Relation between Network Interpretability and Adversarial Robustness

a1noack/interp_regularization 7 Dec 2019

We demonstrate that training the networks to have interpretable gradients improves their robustness to adversarial perturbations.

Proper Network Interpretability Helps Adversarial Robustness in Classification

AkhilanB/Proper-Interpretability ICML 2020

Recent works have empirically shown that there exist adversarial examples that can be hidden from neural network interpretability (namely, making network interpretation maps visually similar), or interpretability is itself susceptible to adversarial attacks.

Attribution Preservation in Network Compression for Reliable Network Interpretation

GeondoPark/attribute-preserve NeurIPS 2020

Neural networks embedded in safety-sensitive applications such as self-driving cars and wearable health monitors rely on two important techniques: input attribution for hindsight analysis and network compression to reduce its size for edge-computing.

Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training

salesforce/lavis 17 Oct 2022

Visual question answering (VQA) is a hallmark of vision and language reasoning and a challenging task under the zero-shot setting.

Towards More Robust Interpretation via Local Gradient Alignment

joshua840/robustaga 29 Nov 2022

However, the lack of considering the normalization of the attributions, which is essential in their visualizations, has been an obstacle to understanding and improving the robustness of feature attribution methods.