Learnable Boundary Guided Adversarial Training

23 Nov 2020  ·  Jiequan Cui, Shu Liu, LiWei Wang, Jiaya Jia ·

Previous adversarial training raises model robustness under the compromise of accuracy on natural data. In this paper, our target is to reduce natural accuracy degradation... We use the model logits from one clean model $\mathcal{M}^{natural}$ to guide learning of the robust model $\mathcal{M}^{robust}$, taking into consideration that logits from the well trained clean model $\mathcal{M}^{natural}$ embed the most discriminative features of natural data, {\it e.g.}, generalizable classifier boundary. Our solution is to constrain logits from the robust model $\mathcal{M}^{robust}$ that takes adversarial examples as input and make it similar to those from a clean model $\mathcal{M}^{natural}$ fed with corresponding natural data. It lets $\mathcal{M}^{robust}$ inherit the classifier boundary of $\mathcal{M}^{natural}$. Thus, we name our method Boundary Guided Adversarial Training (BGAT). Moreover, we generalize BGAT to Learnable Boundary Guided Adversarial Training (LBGAT) by training $\mathcal{M}^{natural}$ and $\mathcal{M}^{robust}$ simultaneously and collaboratively to learn one most robustness-friendly classifier boundary for the strongest robustness. Extensive experiments are conducted on CIFAR-10, CIFAR-100, and challenging Tiny ImageNet datasets. Along with other state-of-the-art adversarial training approaches, {\it e.g.}, Adversarial Logit Pairing (ALP) and TRADES, the performance is further enhanced. read more

PDF Abstract


Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Adversarial Defense CIFAR-100 wideresnet-34-20 autoattack 62.55/30.20 # 1
Adversarial Defense CIFAR-100 wideresnet-34-10 autoattack 70.25/27.16 # 1


No methods listed for this paper. Add relevant methods here