Improved Training of Certifiably Robust Models
Convex relaxations are effective for training and certifying neural networks against norm-bounded adversarial attacks, but they leave a large gap between certifiable and empirical (PGD) robustness. In principle, relaxation can provide tight bounds if the convex relaxation solution is feasible for the original non-relaxed problem. Therefore, we propose two regularizers that can be used to train neural networks that yield convex relaxations with tighter bounds. In all of our experiments, the proposed regularizations result in tighter certification bounds than non-regularized baselines.
PDF Abstract