Avoiding Robust Misclassifications for Improved Robustness without Accuracy Loss

29 Sep 2021 · Yannick Merkli, Pavol Bielik, Petar Tsankov, Martin Vechev ·

While current methods for training robust deep learning models optimize robust accuracy, in practice, the resulting models are often both robust and inaccurate on numerous samples, providing a false sense of safety for those. Further, they significantly reduce natural accuracy, which hinders the adoption in practice. In this work, we address both of these challenges by extending prior works in three main directions. First, we propose a new training method that jointly maximizes robust accuracy and minimizes robust inaccuracy. Second, since the resulting models are trained to be robust only if they are accurate, we leverage robustness as a principled abstain mechanism. Finally, this abstain mechanism allows us to combine models in a compositional architecture that significantly boosts overall robustness without sacrificing accuracy. We demonstrate the effectiveness of our approach to both empirical and certified robustness on six recent state-of-the-art models and using several datasets. Our results show that our method effectively reduces robust and inaccurate samples by up to 97.28%. Further, it successfully enhanced the $\epsilon_\infty = 1/255$ robustness of a state-of-the-art model from 26% to 86% while only marginally reducing its natural accuracy from 97.8% to 97.6%.

PDF Abstract