Disentangling Adversarial Robustness in Directions of the Data Manifold

1 Jan 2021  ·  Jiancong Xiao, Liusha Yang, Zhi-Quan Luo ·

Using generative models (GAN or VAE) to craft adversarial examples, i.e. generative adversarial examples, has received increasing attention in recent years. Previous studies showed through experiments that the generative adversarial examples work differently compared to that of the regular adversarial examples in many aspects, such as attack rates, perceptibility, and generalization. However, few works focus on the theoretical analysis of the attacking mechanisms of the two kinds of adversarial examples. The reasons causing the differences between regular and generative adversarial examples are unclear. In this work, we provide a theoretical study on this problem and show that adversarial robustness can be disentangled in directions of the data manifold. Specifically, we find that: 1. Regular adversarial examples attack towards the small variance directions of the data manifold, while generative adversarial examples attack towards the large variance directions. 2. Standard adversarial training increases model robustness by extending the data manifold boundary in the small variance directions, while on the contrary, adversarial training with generative adversarial examples increases model robustness by extending the data manifold boundary in the large variance directions. These findings are based on the excess risk and optimal saddle point analysis of the minimax problem of adversarial training with generative models. Although the theoretical results are given under assumption of Gaussian mixture data model, our experiments demonstrate that these phenomena also exist in real datasets.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here