Activation Regularization

Introduced by Merity et al. in Revisiting Activation Regularization for Language RNNs

Activation Regularization (AR), or $L_{2}$ activation regularization, is regularization performed on activations as opposed to weights. It is usually used in conjunction with RNNs. It is defined as:

$$\alpha{L}_{2}\left(m\circ{h_{t}}\right) $$

where $m$ is a dropout mask used by later parts of the model, $L_{2}$ is the $L_{2}$ norm, and $h_{t}$ is the output of an RNN at timestep $t$, and $\alpha$ is a scaling coefficient.

When applied to the output of a dense layer, AR penalizes activations that are substantially away from 0, encouraging activations to remain small.

Source: Revisiting Activation Regularization for Language RNNs


Paper Code Results Date Stars


Task Papers Share
Language Modelling 20 17.86%
General Classification 14 12.50%
Text Classification 13 11.61%
Classification 8 7.14%
Sentiment Analysis 8 7.14%
Language Identification 4 3.57%
Translation 4 3.57%
Hate Speech Detection 3 2.68%
Sentence 3 2.68%


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign