Explaining and Harnessing Adversarial Examples

20 Dec 2014Ian J. GoodfellowJonathon ShlensChristian Szegedy

Several machine learning models, including neural networks, consistently misclassify adversarial examples---inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence. Early attempts at explaining this phenomenon focused on nonlinearity and overfitting... (read more)

PDF Abstract

Evaluation results from the paper

Task Dataset Model Metric name Metric value Global rank Compare
Image Classification MNIST Explaining and Harnessing Adversarial Examples Percentage error 0.8 # 11