Axiomatic Attribution for Deep Networks

ICML 2017  ·  Mukund Sundararajan, Ankur Taly, Qiqi Yan ·

We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms---Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.

PDF Abstract ICML 2017 PDF ICML 2017 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Image Attribution CelebA Integrated Gradients Insertion AUC score (ArcFace ResNet-101) 0.3578 # 8
Deletion AUC score (ArcFace ResNet-101) 0.0680 # 1
Interpretability Techniques for Deep Learning CelebA Integrated Gradients Insertion AUC score 0.3578 # 7
Image Attribution CUB-200-2011 Integrated Gradients Insertion AUC score (ResNet-101) 0.0422 # 8
Deletion AUC score (ResNet-101) 0.0728 # 5
Image Attribution VGGFace2 Integrated Gradients Insertion AUC score (ArcFace ResNet-101) 0.5399 # 7
Deletion AUC score (ArcFace ResNet-101) 0.0749 # 1

Methods


No methods listed for this paper. Add relevant methods here