Learning Rate Schedules

Inverse Square Root Schedule

Inverse Square Root is a learning rate schedule 1 / $\sqrt{\max\left(n, k\right)}$ where $n$ is the current training iteration and $k$ is the number of warm-up steps. This sets a constant learning rate for the first $k$ steps, then exponentially decays the learning rate until pre-training is over.

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Language Modelling 98 8.98%
Question Answering 61 5.59%
Decoder 51 4.67%
Text Generation 43 3.94%
Sentence 43 3.94%
Retrieval 37 3.39%
Translation 29 2.66%
Machine Translation 25 2.29%
Natural Language Understanding 21 1.92%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories