Robust Black Box Explanations Under Distribution Shift

ICML 2020 · Himabindu Lakkaraju, Nino Arsov, Osbert Bastani ·

As machine learning black boxes are increasingly being deployed in real-world applications, there has been a growing interest in developing post hoc explanations that summarize the behaviors of these black box models. However, existing algorithms for generating such explanations have been shown to lack robustness with respect to shifts in the underlying data distribution. In this paper, we propose a novel framework for generating robust explanations of black box models based on adversarial training. In particular, our framework optimizes a minimax objective that aims to construct the highest fidelity explanation with respect to the worst-case over a set of distribution shifts. We instantiate this algorithm for explanations in the form of linear models and decision sets by devising the required optimization procedures. To the best of our knowledge, this work makes the first attempt at generating post hoc explanations that are robust to a general class of distribution shifts that are of practical interest. Experimental evaluation with real-world and synthetic datasets demonstrates that our approach substantially improves the robustness of explanations without sacrificing their fidelity on the original data distribution.

PDF ICML 2020 PDF

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Robust Black Box Explanations Under Distribution Shift

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove