A principled approach for generating adversarial images under non-smooth dissimilarity metrics
Deep neural networks perform well on real world data but are prone to adversarial perturbations: small changes in the input easily lead to misclassification. In this work, we propose an attack methodology not only for cases where the perturbations are measured by $\ell_p$ norms, but in fact any adversarial dissimilarity metric with a closed proximal form. This includes, but is not limited to, $\ell_1, \ell_2$, and $\ell_\infty$ perturbations; the $\ell_0$ counting "norm" (i.e. true sparseness); and the total variation seminorm, which is a (non-$\ell_p$) convolutional dissimilarity measuring local pixel changes. Our approach is a natural extension of a recent adversarial attack method, and eliminates the differentiability requirement of the metric. We demonstrate our algorithm, ProxLogBarrier, on the MNIST, CIFAR10, and ImageNet-1k datasets. We consider undefended and defended models, and show that our algorithm easily transfers to various datasets. We observe that ProxLogBarrier outperforms a host of modern adversarial attacks specialized for the $\ell_0$ case. Moreover, by altering images in the total variation seminorm, we shed light on a new class of perturbations that exploit neighboring pixel information.
PDF Abstract