The Efficacy of $L_1$ Regularization in Two-Layer Neural Networks
A crucial problem in neural networks is to select the most appropriate number of hidden neurons and obtain tight statistical risk bounds. In this work, we present a new perspective towards the bias-variance tradeoff in neural networks. As an alternative to selecting the number of neurons, we theoretically show that $L_1$ regularization can control the generalization error and sparsify the input dimension. In particular, with an appropriate $L_1$ regularization on the output layer, the network can produce a statistical risk that is near minimax optimal. Moreover, an appropriate $L_1$ regularization on the input layer leads to a risk bound that does not involve the input data dimension. Our analysis is based on a new amalgamation of dimension-based and norm-based complexity analysis to bound the generalization error. A consequent observation from our results is that an excessively large number of neurons do not necessarily inflate generalization errors under a suitable regularization.
PDF Abstract