Adversarial training, in which a network is trained on adversarial examples, is one of the few defenses against adversarial attacks that withstands strong attacks. Unfortunately, the high cost of generating strong adversarial examples makes standard adversarial training impractical on large-scale problems like ImageNet. We present an algorithm that eliminates the overhead cost of generating adversarial examples by recycling the gradient information computed when updating model parameters. Our "free" adversarial training algorithm achieves comparable robustness to PGD adversarial training on the CIFAR-10 and CIFAR-100 datasets at negligible additional cost compared to natural training, and can be 7 to 30 times faster than other strong adversarial training methods. Using a single workstation with 4 P100 GPUs and 2 days of runtime, we can train a robust model for the large-scale ImageNet classification task that maintains 40% accuracy against PGD attacks. The code is available at https://github.com/ashafahi/free_adv_train.

PDF Abstract NeurIPS 2019 PDF NeurIPS 2019 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Adversarial Defense ImageNet (non-targeted PGD, max perturbation=4) ResNet-152 free-m=4 Accuracy 36.0% # 3
Adversarial Defense ImageNet (non-targeted PGD, max perturbation=4) ResNet-101 free-m=4 Accuracy 34.3% # 4
Adversarial Defense ImageNet (non-targeted PGD, max perturbation=4) ResNet-50 free-m=4 Accuracy 31.8% # 5
Domain Generalization VizWiz-Classification ResNet-50 (adv-train-free) Accuracy - All Images 26.7 # 85
Accuracy - Corrupted Images 20.5 # 84
Accuracy - Clean Images 30.9 # 85

Methods


No methods listed for this paper. Add relevant methods here