Origins of Low-dimensional Adversarial Perturbations

25 Mar 2022  ·  Elvis Dohmatob, Chuan Guo, Morgane Goibert ·

In this paper, we initiate a rigorous study of the phenomenon of low-dimensional adversarial perturbations (LDAPs) in classification. Unlike the classical setting, these perturbations are limited to a subspace of dimension $k$ which is much smaller than the dimension $d$ of the feature space. The case $k=1$ corresponds to so-called universal adversarial perturbations (UAPs; Moosavi-Dezfooli et al., 2017). First, we consider binary classifiers under generic regularity conditions (including ReLU networks) and compute analytical lower-bounds for the fooling rate of any subspace. These bounds explicitly highlight the dependence of the fooling rate on the pointwise margin of the model (i.e., the ratio of the output to its $L_2$ norm of its gradient at a test point), and on the alignment of the given subspace with the gradients of the model w.r.t. inputs. Our results provide a rigorous explanation for the recent success of heuristic methods for efficiently generating low-dimensional adversarial perturbations. Finally, we show that if a decision-region is compact, then it admits a universal adversarial perturbation with $L_2$ norm which is $\sqrt{d}$ times smaller than the typical $L_2$ norm of a data point. Our theoretical results are confirmed by experiments on both synthetic and real data.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here