Certified Defenses against Adversarial Examples

ICLR 2018 Aditi Raghunathan • Jacob Steinhardt • Percy Liang

While neural networks have achieved high accuracy on standard image classification benchmarks, their accuracy drops to nearly zero in the presence of small adversarial perturbations to test inputs. Defenses based on regularization and adversarial training have been proposed, but often followed by new, stronger attacks that defeat these defenses. Can we somehow end this arms race?

