Improved Generalization-Robustness Trade-off via Uncertainty Targeted Attacks

29 Sep 2021  ·  Matteo Pagliardini, Gilberto Manunza, Martin Jaggi, Tatjana Chavdarova ·

The deep learning models' sensitivity to small input perturbations raises security concerns and limits their use for applications where reliability is critical. While adversarial training methods aim at training more robust models, these techniques often result in a lower unperturbed (clean) test accuracy, including the most widely used Projected Gradient Descent (PGD) method. In this work, we propose uncertainty-targeted attacks (UTA), where the perturbations are obtained by maximizing the model's estimated uncertainty. We demonstrate on MNIST, Fashion-MNIST and CIFAR-10 that this approach does not drastically deteriorate the clean test accuracy relative to PGD whilst it is robust to PGD attacks. In particular, uncertainty-based attacks allow for using larger $L_\infty$-balls around the training data points, are less prone to overfitting the attack, and yield improved generalization-robustness trade-off.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods