A Benchmark for Interpretability Methods in Deep Neural Networks

NeurIPS 2019 Sara HookerDumitru ErhanPieter-Jan KindermansBeen Kim

We propose an empirical measure of the approximate accuracy of feature importance estimates in deep neural networks. Our results across several large-scale image classification datasets show that many popular interpretability methods produce estimates of feature importance that are not better than a random designation of feature importance... (read more)

PDF Abstract

Evaluation Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.