The library may be used to develop more robust machine learning models and to provide standardized benchmarks of models' performance in the adversarial setting. Section 1 provides an overview of adversarial examples in machine learning and of the CleverHans software.
Foolbox is a new Python package to generate such adversarial perturbations and to quantify and compare the robustness of machine learning models. The code is licensed under the MIT license and is openly available at https://github.com/bethgelab/foolbox .
We identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to defeat iterative optimization-based attacks, we find defenses relying on this effect can be circumvented.
State-of-the-art deep neural networks have achieved impressive results on many image classification tasks. However, these same architectures have been shown to be unstable to small, well sought, perturbations of the images.
This study suggests that adversarial perturbations on images lead to noise in the features constructed by these networks. Motivated by this observation, we develop new network architectures that increase adversarial robustness by performing feature denoising.
We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well.
First, with HGD as a defense, the target model is more robust to either white-box or black-box adversarial attacks. Second, HGD can be trained on a small subset of the images and generalizes well to other images and unseen classes.
Due to their complex nature, it is hard to characterize the ways in which machine learning models can misbehave or be exploited when deployed. Recent work on adversarial examples, i.e. inputs with minor perturbations that result in substantially different model predictions, is helpful in evaluating the robustness of these models by exposing the adversarial scenarios where they fail.
Due to the complex nature of deep learning, it is challenging to understand how deep models can be fooled by adversarial examples. (2) It can help security experts explore more vulnerability of deep learning as a software module.
However, most of existing adversarial attacks can only fool a black-box model with a low success rate. To further improve the success rates for black-box attacks, we apply momentum iterative algorithms to an ensemble of models, and show that the adversarially trained models with a strong defense ability are also vulnerable to our black-box attacks.