L2 Regularization
28 papers with code • 0 benchmarks • 0 datasets
See Weight Decay.
$L_{2}$ Regularization or Weight Decay, is a regularization technique applied to the weights of a neural network. We minimize a loss function compromising both the primary loss function and a penalty on the $L_{2}$ Norm of the weights:
$$L_{new}\left(w\right) = L_{original}\left(w\right) + \lambda{w^{T}w}$$
where $\lambda$ is a value determining the strength of the penalty (encouraging smaller weights).
Weight decay can be incorporated directly into the weight update rule, rather than just implicitly by defining it through to objective function. Often weight decay refers to the implementation where we specify it directly in the weight update rule (whereas L2 regularization is usually the implementation which is specified in the objective function).
Benchmarks
These leaderboards are used to track progress in L2 Regularization
Most implemented papers
Understanding and Stabilizing GANs' Training Dynamics with Control Theory
There are existing efforts that model the training dynamics of GANs in the parameter space but the analysis cannot directly motivate practically effective stabilizing methods.
Data and Model Dependencies of Membership Inference Attack
Our results reveal the relationship between MIA accuracy and properties of the dataset and training model in use.
Distributionally Robust Neural Networks
Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups.
Label-Only Membership Inference Attacks
We empirically show that our label-only membership inference attacks perform on par with prior attacks that required access to model confidences.
Neural Pruning via Growing Regularization
Regularization has long been utilized to learn sparsity in deep neural network pruning.
Towards Unsupervised Deep Image Enhancement with Generative Adversarial Network
In this paper, we present an unsupervised image enhancement generative adversarial network (UEGAN), which learns the corresponding image-to-image mapping from a set of images with desired characteristics in an unsupervised manner, rather than learning on a large number of paired images.
Learning with Hyperspherical Uniformity
Due to the over-parameterization nature, neural networks are a powerful tool for nonlinear function approximation.
The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective
Our analysis in this paper decouples capacity and width via the generalization of neural networks to Deep Gaussian Processes (Deep GP), a class of nonparametric hierarchical models that subsume neural nets.
Sequence Length is a Domain: Length-based Overfitting in Transformer Models
We demonstrate on a simple string editing task and a machine translation task that the Transformer model performance drops significantly when facing sequences of length diverging from the length distribution in the training data.
Disturbing Target Values for Neural Network Regularization
This active regularization makes use of the model behavior during training to regularize it in a more directed manner.