GANMEX: Class-Targeted One-vs-One Attributions using GAN-based Model Explainability

1 Jan 2021  ·  Sheng-Min Shih, Pin-Ju Tien, Zohar Karnin ·

Attribution methods have been shown as promising approaches for identifying key features that led to learned model predictions. While most existing attribution methods rely on a baseline input for performing feature perturbations, limited research has been conducted to address the baseline selection issues. Poor choices of baselines can lead to unfair attributions as well as limited ability of one-vs-one explanations for multi-class classifiers, which means explaining why the input belongs to its original class but not the other specified target class. Achieving one-vs-one explanation is crucial when certain classes are more similar than others, e.g. two bird types among multiple animals. One-vs-one explanations focus on key differentiating features rather than features shared across the original and the target classes. In this paper, we present GANMEX, a novel algorithm applying Generative Adversarial Networks (GAN) by incorporating the to-be-explained classifier as part of the adversarial networks. Our approach effectively selects the baseline as the closest realistic sample belong to the target class, which allows attribution methods to provide true one-vs-one explanations. We showed that GANMEX baselines improved the saliency maps visually and led to stronger performance on perturbation-based evaluation metrics over the existing baselines. Attribution results with the existing baselines are known to be insensitive to model randomization, and we demonstrated that GANMEX baselines led to better outcome under the randomization sanity checks.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here