Improving Robustness with Optimal Transport based Adversarial Generalization

Deep nets have proven to be brittle against crafted adversarial examples. One of the main reasons is that the representations of the adversarial examples gradually become more divergent from those of the benign examples when feed-forwarding up to higher layers of deep nets. To remedy susceptibility to adversarial examples, it is natural to mitigate this divergence. In this paper, leveraging the richness and rigor of optimal transport (OT) theory, we propose an OT-based adversarial generalization technique that helps strengthen the classifier for tackling adversarial examples. The main idea of our proposed method is to examine a specific Wasserstein (WS) distance between the adversarial and benign joint distributions on an intermediate layer of a deep net, which can further be interpreted from a clustering view of OT as a generalization technique. More specifically, by minimizing the WS distance of interest, an adversarial example is pushed toward the cluster of benign examples sharing the same label on the latent space, which helps to strengthen the generalization ability of the classifier on the adversarial examples. Our comprehensive experiments with state-of-the-art adversarial training and defense on latent space approaches indicate the significant superiority of our method under specific attacks of various distortion sizes. The results demonstrate improvements in robust accuracy up to $5\%$ against PGD attack on CIFAR-100 over the SOTA methods.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here