1 code implementation • 18 Dec 2020 • Jonathan Helland, Nathan VanHoudnos
In this work, we investigate the phenomenon that robust image classifiers have human-recognizable features -- often referred to as interpretability -- as revealed through the input gradients of their score functions and their subsequent adversarial perturbations.