Regularization

# Activation Regularization

Introduced by Merity et al. in Revisiting Activation Regularization for Language RNNs

Activation Regularization (AR), or $L_{2}$ activation regularization, is regularization performed on activations as opposed to weights. It is usually used in conjunction with RNNs. It is defined as:

$$\alpha{L}_{2}\left(m\circ{h_{t}}\right)$$

where $m$ is a dropout mask used by later parts of the model, $L_{2}$ is the $L_{2}$ norm, and $h_{t}$ is the output of an RNN at timestep $t$, and $\alpha$ is a scaling coefficient.

When applied to the output of a dense layer, AR penalizes activations that are substantially away from 0, encouraging activations to remain small.

#### Papers

Paper Code Results Date Stars