Bridging Adversarial Robustness and Gradient Interpretability

27 Mar 2019Beomsu KimJunghoon SeoTaegyun Jeon

Adversarial training is a training scheme designed to counter adversarial attacks by augmenting the training dataset with adversarial examples. Surprisingly, several studies have observed that loss gradients from adversarially trained DNNs are visually more interpretable than those from standard DNNs... (read more)

PDF Abstract

Evaluation results from the paper

  Submit results from this paper to get state-of-the-art GitHub badges and help community compare results to other papers.