Perturbation Deterioration: The Other Side of Catastrophic Overfitting

29 Sep 2021 · Zichao Li, Liyuan Liu, chengyu dong, Jingbo Shang ·

Our goal is to understand why the robustness accuracy would abruptly drop to zero, after conducting FGSM-style adversarial training for too long. While this phenomenon is commonly explained as overfitting, we observe that it is a twin process: not only does the model catastrophic overfits to one type of perturbation, but also the perturbation deteriorates into random noise. For example, at the same epoch when the FGSM-trained model catastrophically overfits, its generated perturbations deteriorate into random noise. Intuitively, once the generated perturbations become weak and inadequate, models would be misguided to overfit those weak attacks and fail to defend strong ones. In the light of our analyses, we propose APART, an adaptive adversarial training method, which parameterizes perturbation generation and progressively strengthens them. In our experiments, APART successfully prevents perturbation deterioration and catastrophic overfitting. Also, APART significantly improves the model robustness while maintaining the same efficiency as FGSM-style methods, e.g., on the CIFAR-10 dataset, APART achieves 53.89%accuracy under the PGD-20 attack and 49.05% accuracy under the AutoAttack.

PDF Abstract