One difficulty that arises with optimization of deep neural networks is that large parameter gradients can lead an SGD optimizer to update the parameters strongly into a region where the loss function is much greater, effectively undoing much of the work that was needed to get to the current solution.
Gradient Clipping clips the size of the gradients to ensure optimization performs more reasonably near sharp areas of the loss surface. It can be performed in a number of ways. One option is to simply clip the parameter gradient element-wise before a parameter update. Another option is to clip the norm ||$\textbf{g}$|| of the gradient $\textbf{g}$ before a parameter update:
$$\text{ if } ||\textbf{g}|| > v \text{ then } \textbf{g} \leftarrow \frac{\textbf{g}{v}}{||\textbf{g}||}$$
where $v$ is a norm threshold.
Source: Deep Learning, Goodfellow et al
Image Source: Pascanu et al
Paper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 8 | 8.42% |
Reinforcement Learning (RL) | 8 | 8.42% |
Federated Learning | 7 | 7.37% |
Image Classification | 6 | 6.32% |
Text Generation | 4 | 4.21% |
Object Detection | 3 | 3.16% |
General Classification | 3 | 3.16% |
Semantic Segmentation | 2 | 2.11% |
Self-Supervised Learning | 2 | 2.11% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |