Entropy Weighted Adversarial Training

Adversarial training methods, which minimizes the loss of adversarially-perturbed training examples, have been extensively studied as a solution to improve the robustness of the deep neural networks. However, most adversarial training methods treat all training examples equally, while each example may have a different impact on the model's robustness during the course of training. Recent works have exploited such unequal importance of adversarial samples to model's robustness, which has been shown to obtain high robustness against untargeted PGD attacks. However, we empirically observe that they make the feature spaces of adversarial samples across different classes overlap, and thus yield more high-entropy samples whose labels could be easily flipped. This makes them more vulnerable to targeted adversarial perturbations. Moreover, to address such limitations, we propose a simple yet effective weighting scheme, Entropy-Weighted Adversarial Training (EWAT), which weighs the loss for each adversarial training example proportionally to the entropy of its predicted distribution, to focus on examples whose labels are more uncertain. We validate our method on multiple benchmark datasets and show that it achieves an impressive increase of robust accuracy.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here