Towards Understanding Catastrophic Overfitting in Fast Adversarial Training

29 Sep 2021 · Renjie Chen, Yuan Luo, Yisen Wang ·

After adversarial training was proposed, a series of works focus on improving the compunational efficiency of adversarial training for deep neural networks (DNNs). Recently, FGSM based single-step adversarial training has been found to be able to train a robust model with the robustness comparable to the one trained by multi-step PGD, but it is an order of magnitude faster. However, there exists a failure mode called Catastrophic Overfitting (CO) where the network loses its robustness against PGD attack suddenly and can be hardly recovered by itself during the training process. In this paper, we identify that CO is closely related to the high-order terms in Taylor expansion after rethinking and decomposing the min-max problem in adversarial training. The negative high-order terms lead to a phenomenon called Perturbation Loss Distortion, which is the underlying cause of CO. Based on the observations, we propose a simple but effective regularization method named Fast Linear Adversarial Training (FLAT) to avoid CO in the single-step adversarial training by making the loss surface flat.

PDF Abstract