Protect the weak: Class focused online learning for adversarial training

29 Sep 2021 · Thomas Pethick, Grigorios Chrysos, Volkan Cevher ·

Adversarial training promises a defense against adversarial perturbations in terms of average accuracy. In this work, we identify that the focus on the average accuracy metric can create vulnerabilities to the "weakest" class. For instance, on CIFAR10, where the average accuracy is 47%, the worst class accuracy can be as low as 14%. The performance sacrifice of the weakest class can be detrimental for real-world systems, if indeed the threat model can adversarially choose the class to attack. To this end, we propose to explicitly minimize the worst class error, which results in a min-max-max optimization formulation. We provide high probability convergence guarantees of the worst class loss for our method, dubbed as class focused online learning (CFOL), which can be plugged into existing training setups with virtually no overhead in computation. We observe significant improvements on the worst class accuracy of 30% for CIFAR10. We also observe consistent behavior across CIFAR100 and STL10. Intriugingly, we find that minimizing the worst case can even sometimes improve the average.

PDF Abstract