Generating Natural Adversarial Examples

ICLR 2018 Zhengli Zhao • Dheeru Dua • Sameer Singh

Due to their complex nature, it is hard to characterize the ways in which machine learning models can misbehave or be exploited when deployed. Recent work on adversarial examples, i.e. inputs with minor perturbations that result in substantially different model predictions, is helpful in evaluating the robustness of these models by exposing the adversarial scenarios where they fail. However, these malicious perturbations are often unnatural, not semantically meaningful, and not applicable to complicated domains such as language.

Full paper

Evaluation


No evaluation results yet. Help compare this paper to other papers by submitting the tasks and evaluation metrics from the paper.