Variational Perturbations for Visual Feature Attribution
Explaining a complex black-box system in a post-hoc manner is important to understand its predictions. In this work we focus on two objectives, namely on how well the estimated explanation describes the classifier's behavior (faithfulness), and how sensitive the explanation is to input variations or model configurations (robustness). To achieve both faithfulness and robustness, we propose an uncertainty-aware explanation model, Variational Perturbations (VP), that learns a distribution of feature attribution for each image input and the corresponding classifier outputs. This differs from existing methods, which learn one deterministic estimate of feature attribution. We validate that according to several robustness and faithfulness metrics our VP method provides more reliable explanations compared to state-of-the-art methods on MNIST, CUB, and ImageNet datasets while also being more efficient.
PDF Abstract