Activation Regularization (AR), or $L_{2}$ activation regularization, is regularization performed on activations as opposed to weights. It is usually used in conjunction with RNNs. It is defined as:
$$\alpha{L}_{2}\left(m\circ{h_{t}}\right) $$
where $m$ is a dropout mask used by later parts of the model, $L_{2}$ is the $L_{2}$ norm, and $h_{t}$ is the output of an RNN at timestep $t$, and $\alpha$ is a scaling coefficient.
When applied to the output of a dense layer, AR penalizes activations that are substantially away from 0, encouraging activations to remain small.
Source: Revisiting Activation Regularization for Language RNNsPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 20 | 13.61% |
Language Modeling | 18 | 12.24% |
Text Classification | 15 | 10.20% |
General Classification | 14 | 9.52% |
Sentiment Analysis | 9 | 6.12% |
Classification | 8 | 5.44% |
Language Identification | 4 | 2.72% |
Translation | 4 | 2.72% |
Decision Making | 3 | 2.04% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |