Exploiting Safe Spots in Neural Networks for Preemptive Robustness and Out-of-Distribution Detection

1 Jan 2021  ·  Seungyong Moon, Gaon An, Hyun Oh Song ·

Recent advances on adversarial defense mainly focus on improving the classifier’s robustness against adversarially perturbed inputs. In this paper, we turn our attention from classifiers to inputs and explore if there exist safe spots in the vicinity of natural images that are robust to adversarial attacks. In this regard, we introduce a novel bi-level optimization algorithm that can find safe spots on over 90% of the correctly classified images for adversarially trained classifiers on CIFAR-10 and ImageNet datasets. Our experiments also show that they can be used to improve both the empirical and certified robustness on smoothed classifiers. Furthermore, by exploiting a novel safe spot inducing model training scheme and our safe spot generation method, we propose a new out-of-distribution detection algorithm which achieves the state of the art results on near-distribution outliers.

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here