Weight Decay

Weight Decay, or $L_{2}$ Regularization, is a regularization technique applied to the weights of a neural network. We minimize a loss function compromising both the primary loss function and a penalty on the $L_{2}$ Norm of the weights:

$$L_{new}\left(w\right) = L_{original}\left(w\right) + \lambda{w^{T}w}$$

where $\lambda$ is a value determining the strength of the penalty (encouraging smaller weights).

Weight decay can be incorporated directly into the weight update rule, rather than just implicitly by defining it through to objective function. Often weight decay refers to the implementation where we specify it directly in the weight update rule (whereas L2 regularization is usually the implementation which is specified in the objective function).

Image Source: Deep Learning, Goodfellow et al

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Retrieval	78	9.79%
Language Modelling	69	8.66%
Question Answering	47	5.90%
Large Language Model	41	5.14%
Sentence	23	2.89%
In-Context Learning	23	2.89%
Text Generation	22	2.76%
Information Retrieval	18	2.26%
Code Generation	18	2.26%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Regularization

Parameter Norm Penalties