Buffer Zone based Defense against Adversarial Examples in Image Classification
Recent defenses published at venues like NIPS, ICML, ICLR and CVPR are mainly focused on mitigating white-box attacks. These defenses do not properly consider adaptive adversaries. In this paper, we expand the scope of these defenses to include adaptive black-box adversaries. Based on our study of these defenses, we develop three contributions. First we propose a new metric for evaluating adversarial robustness when clean accuracy is impacted. Second, we create an enhanced adaptive black-box attack. Third and most significantly, we develop a novel defense against these adaptive black-box attacks. Our defense is based on a combination of deep neural networks and simple image transformations. While straight forward in implementation, this defense yields a unique security property which we term buffer zones. We argue that our defense based on buffer zones offers significant improvements over state-of-the-art defenses. We verify our claims through extensive experimentation. Our results encompass three adversarial models (10 different black-box attacks) on 11 defenses with two datasets (CIFAR-10 and Fashion-MNIST).
PDF Abstract