An evaluation of quality and robustness of smoothed explanations

29 Sep 2021 · Ahmad Ajalloeian, Seyed-Mohsen Moosavi-Dezfooli, Michalis Vlachos, Pascal Frossard ·

Explanation methods play a crucial role in helping to understand the decisions of deep neural networks (DNNs) to develop trust that is critical for the adoption of predictive models. However, explanation methods are easily manipulated through visually imperceptible perturbations that generate misleading explanations. The geometry of the decision surface of the DNNs has been identified as the main cause of this phenomenon and several \emph{smoothing} approaches have been proposed to build more robust explanations. In this work, we provide a thorough evaluation of the quality and robustness of the explanations derived by smoothing approaches. Their different properties are evaluated with extensive experiments, which reveal the settings where the smoothed explanations are better, and also worse than the explanations derived by the common Gradient method. By making the connection with the literature on adversarial attacks, we further show that such smoothed explanations are robust primarily against additive $\ell_p$-norm attacks. However, a combination of additive and non-additive attacks can still manipulate these explanations, which reveals shortcomings in their robustness properties.

PDF Abstract