Gradient Sign Dropout

Introduced by Chen et al. in Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout

GradDrop, or Gradient Sign Dropout, is a probabilistic masking procedure which samples gradients at an activation layer based on their level of consistency. It is applied as a layer in any standard network forward pass, usually on the final layer before the prediction head to save on compute overhead and maximize benefits during backpropagation. Below, we develop the GradDrop formalism. Throughout, o denotes elementwise multiplication after any necessary tiling operations (if any) are completed. To implement GradDrop, we first define the Gradient Positive Sign Purity, $\mathcal{P}$, as

$$ \mathcal{P}=\frac{1}{2}\left(1+\frac{\sum_{i} \nabla L_{i}}{\sum_{i}\left|\nabla L_{i}\right|}\right) $$

$\mathcal{P}$ is bounded by $[0,1] .$ For multiple gradient values $\nabla_{a} L_{i}$ at some scalar $a$, we see that $\mathcal{P}=0$ if $\nabla_{a} L_{i}<0 $ $\forall i$, while $\mathcal{P}=1$ if $\nabla_{a} L_{i}>0$ $\forall i $. Thus, $\mathcal{P}$ is a measure of how many positive gradients are present at any given value. We then form a mask for each gradient $\mathcal{M}_{i}$ as follows:

$$ \mathcal{M}_{i}=\mathcal{I}[f(\mathcal{P})>U] \circ \mathcal{I}\left[\nabla L_{i}>0\right]+\mathcal{I}[f(\mathcal{P})<U] \circ \mathcal{I}\left[\nabla L_{i}<0\right] $$

for $\mathcal{I}$ the standard indicator function and $f$ some monotonically increasing function (often just the identity) that maps $[0,1] \mapsto[0,1]$ and is odd around $(0.5,0.5)$. $U$ is a tensor composed of i.i.d $U(0,1)$ random variables. The $\mathcal{M}_{i}$ is then used to produce a final gradient $\sum \mathcal{M}_{i} \nabla L_{i}$

Source: Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
XLM-R	1	100.00%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Regularization