Dissecting Local Properties of Adversarial Examples

29 Sep 2021 · Lu Chen, Renjie Chen, Hang Guo, Yuan Luo, Quanshi Zhang, Yisen Wang ·

Adversarial examples have attracted significant attention over the years, yet a sufficient understanding is in lack, especially when analyzing their performances in combination with adversarial training. In this paper, we revisit some properties of adversarial examples from both frequency and spatial perspectives: 1) the special high-frequency components of adversarial examples tend to mislead naturally-trained models while have little impact on adversarially-trained ones, and 2) adversarial examples show disorderly perturbations on naturally-trained models and locally-consistent (image shape related) perturbations on adversarially-trained ones. Motivated by these, we analyze the fragile tendency of models with the generated adversarial perturbations, and propose a connection with model vulnerability and local intermediate response. That is, a smaller local intermediate response comes along with better model adversarial robustness. To be specific, we demonstrate that: 1) DNNs are naturally fragile at least for large enough local response differences between adversarial/natural examples, 2) and smoother adversarially-trained models can alleviate local response differences with enhanced robustness.

PDF Abstract