Axiomatic Attribution for Deep Networks
We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms---Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.
PDF Abstract ICML 2017 PDF ICML 2017 AbstractCode
Datasets
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Image Attribution | CelebA | Integrated Gradients | Insertion AUC score (ArcFace ResNet-101) | 0.3578 | # 8 | |
Deletion AUC score (ArcFace ResNet-101) | 0.0680 | # 1 | ||||
Interpretability Techniques for Deep Learning | CelebA | Integrated Gradients | Insertion AUC score | 0.3578 | # 7 | |
Image Attribution | CUB-200-2011 | Integrated Gradients | Insertion AUC score (ResNet-101) | 0.0422 | # 8 | |
Deletion AUC score (ResNet-101) | 0.0728 | # 5 | ||||
Image Attribution | VGGFace2 | Integrated Gradients | Insertion AUC score (ArcFace ResNet-101) | 0.5399 | # 7 | |
Deletion AUC score (ArcFace ResNet-101) | 0.0749 | # 1 |