535 papers with code • 6 benchmarks • 8 datasets
Adversarial Robustness evaluates the vulnerabilities of machine learning models under various types of adversarial attacks.
Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal.
We show how to turn any classifier that classifies well under Gaussian noise into a new classifier that is certifiably robust to adversarial perturbations under the $\ell_2$ norm.
The field of defense strategies against adversarial attacks has significantly grown over the last years, but progress is hampered as the evaluation of adversarial defenses is often insufficient and thus gives a wrong impression of robustness.
We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples.
Recent studies have highlighted the vulnerability of deep neural networks (DNNs) to adversarial examples - a visually indistinguishable adversarial image can easily be crafted to cause a well-trained model to misclassify.
Though deep neural networks have achieved significant progress on various tasks, often enhanced by model ensemble, existing high-performance models can be vulnerable to adversarial attacks.
In particular, against $\ell_\infty$ norm-bounded perturbations of size $\epsilon = 8/255$, our model reaches 64. 20% robust accuracy without using any external data, beating most prior works that use external data.
Defending Machine Learning models involves certifying and verifying model robustness and model hardening with approaches such as pre-processing inputs, augmenting training data with adversarial samples, and leveraging runtime detection methods to flag any inputs that might have been modified by an adversary.