Efficient proximal mapping of the path-norm regularizer of shallow networks

We demonstrate two new important properties of the path-norm regularizer for shallow neural networks. First, despite its non-smoothness and non-convexity it allows a closed form proximal operator which can be efficiently computed, allowing the use of stochastic proximal-gradient-type methods for regularized empirical risk minimization. Second, it provides an upper bound on the Lipschitz constant of the network, which is tighter than the trivial layer-wise product of Lipschitz constants, motivating its use for training networks robust to adversarial perturbations. Finally, in practical experiments we show that it provides a better robustness-accuracy trade-off when compared to $\ell_1$-norm regularization or training with a layer-wise constrain of the Lipschitz constant.

PDF ICML 2020 PDF
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here