272 papers with code • 2 benchmarks • 6 datasets
An Adversarial Attack is a technique to find a perturbation that changes the prediction of a machine learning model. The perturbation can be very small and imperceptible to human eyes.
Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal.
Ranked #2 on Robust classification on CIFAR-10
An adversarial example library for constructing attacks, building defenses, and benchmarking both
In this work, we formalize the space of adversaries against deep neural networks (DNNs) and introduce a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs.
Based on this observation, we propose a defense approach which inspects the graph and recovers the potential adversarial perturbations.
Evaluating adversarial robustness amounts to finding the minimum perturbation needed to have an input sample misclassified.
Foolbox is a new Python package to generate such adversarial perturbations and to quantify and compare the robustness of machine learning models.
TextAttack also includes data augmentation and adversarial training modules for using components of adversarial attacks to improve model accuracy and robustness.
Adversarial attacks for discrete data (such as texts) have been proved significantly more challenging than continuous data (such as images) since it is difficult to generate adversarial samples with gradient-based methods.