Regularization

# Temporal Activation Regularization

Introduced by Merity et al. in Revisiting Activation Regularization for Language RNNs

Temporal Activation Regularization (TAR) is a type of slowness regularization for RNNs that penalizes differences between states that have been explored in the past. Formally we minimize:

$$\beta{L_{2}}\left(h_{t} - h_{t+1}\right)$$

where $L_{2}$ is the $L_{2}$ norm, $h_{t}$ is the output of the RNN at timestep $t$, and $\beta$ is a scaling coefficient.

