Activation Regularization

Introduced by Merity et al. in Revisiting Activation Regularization for Language RNNs

Activation Regularization (AR), or $L_{2}$ activation regularization, is regularization performed on activations as opposed to weights. It is usually used in conjunction with RNNs. It is defined as:

$$\alpha{L}_{2}\left(m\circ{h_{t}}\right) $$

where $m$ is a dropout mask used by later parts of the model, $L_{2}$ is the $L_{2}$ norm, and $h_{t}$ is the output of an RNN at timestep $t$, and $\alpha$ is a scaling coefficient.

When applied to the output of a dense layer, AR penalizes activations that are substantially away from 0, encouraging activations to remain small.

Source: Revisiting Activation Regularization for Language RNNs

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	20	17.86%
General Classification	14	12.50%
Text Classification	13	11.61%
Classification	8	7.14%
Sentiment Analysis	8	7.14%
Language Identification	4	3.57%
Translation	4	3.57%
Hate Speech Detection	3	2.68%
Sentence	3	2.68%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Regularization