L2 Regularization
28 papers with code • 0 benchmarks • 0 datasets
See Weight Decay.
$L_{2}$ Regularization or Weight Decay, is a regularization technique applied to the weights of a neural network. We minimize a loss function compromising both the primary loss function and a penalty on the $L_{2}$ Norm of the weights:
$$L_{new}\left(w\right) = L_{original}\left(w\right) + \lambda{w^{T}w}$$
where $\lambda$ is a value determining the strength of the penalty (encouraging smaller weights).
Weight decay can be incorporated directly into the weight update rule, rather than just implicitly by defining it through to objective function. Often weight decay refers to the implementation where we specify it directly in the weight update rule (whereas L2 regularization is usually the implementation which is specified in the objective function).
Benchmarks
These leaderboards are used to track progress in L2 Regularization
Latest papers with no code
Linking Neural Collapse and L2 Normalization with Improved Out-of-Distribution Detection in Deep Neural Networks
We propose a simple modification to standard ResNet architectures--L2 normalization over feature space--that substantially improves out-of-distribution (OoD) performance on the previously proposed Deep Deterministic Uncertainty (DDU) benchmark.
On the utility and protection of optimization with differential privacy and classic regularization techniques
According to the literature, this approach has proven to be a successful defence against several models' privacy attacks, but its downside is a substantial degradation of the models' performance.
Perturbation of Deep Autoencoder Weights for Model Compression and Classification of Tabular Data
Unlike dropout learning, the proposed weight perturbation routine additionally achieves 15% to 40% sparsity across six tabular data sets for the compression of deep pretrained models.
Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks
L2 regularization for weights in neural networks is widely used as a standard training trick.
A Note on the Regularity of Images Generated by Convolutional Neural Networks
The regularity of images generated by convolutional neural networks, such as the U-net, generative networks, or the deep image prior, is analyzed.
A Closer Look at Rehearsal-Free Continual Learning
Next, we explore how to leverage knowledge from a pre-trained model in rehearsal-free continual learning and find that vanilla L2 parameter regularization outperforms EWC parameter regularization and feature distillation.
Probabilistic fine-tuning of pruning masks and PAC-Bayes self-bounded learning
In the linear model, we show that a PAC-Bayes generalization error bound is controlled by the magnitude of the change in feature alignment between the 'prior' and 'posterior' data.
Regularized Training of Nearest Neighbor Language Models
In particular, we find that the added L2 regularization seems to improve the performance for high-frequency words without deteriorating the performance for low frequency ones.
Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity
The dynamics of Deep Linear Networks (DLNs) is dramatically affected by the variance $\sigma^2$ of the parameters at initialization $\theta_0$.
Guiding Teacher Forcing with Seer Forcing for Neural Machine Translation
Meanwhile, we force the conventional decoder to simulate the behaviors of the seer decoder via knowledge distillation.