Cosine Annealing is a type of learning rate schedule that has the effect of starting with a large learning rate that is relatively rapidly decreased to a minimum value before being increased rapidly again. The resetting of the learning rate acts like a simulated restart of the learning process and the re-use of good weights as the starting point of the restart is referred to as a "warm restart" in contrast to a "cold restart" where a new set of small random numbers may be used as a starting point.
$$\eta_{t} = \eta_{min}^{i} + \frac{1}{2}\left(\eta_{max}^{i}-\eta_{min}^{i}\right)\left(1+\cos\left(\frac{T_{cur}}{T_{i}}\pi\right)\right) $$
Where where $\eta_{min}^{i}$ and $ \eta_{max}^{i}$ are ranges for the learning rate, and $T_{cur}$ account for how many epochs have been performed since the last restart.
Text Source: Jason Brownlee
Image Source: Gao Huang
Source: SGDR: Stochastic Gradient Descent with Warm RestartsPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 50 | 6.00% |
Language Modeling | 49 | 5.88% |
Large Language Model | 39 | 4.68% |
Question Answering | 25 | 3.00% |
RAG | 24 | 2.88% |
Retrieval-augmented Generation | 21 | 2.52% |
Retrieval | 21 | 2.52% |
Sentiment Analysis | 19 | 2.28% |
Decision Making | 18 | 2.16% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |