Inverse Square Root is a learning rate schedule 1 / $\sqrt{\max\left(n, k\right)}$ where $n$ is the current training iteration and $k$ is the number of warm-up steps. This sets a constant learning rate for the first $k$ steps, then exponentially decays the learning rate until pre-training is over.
Paper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 78 | 9.30% |
Question Answering | 61 | 7.27% |
Text Generation | 45 | 5.36% |
Retrieval | 26 | 3.10% |
Natural Language Understanding | 23 | 2.74% |
Machine Translation | 22 | 2.62% |
Semantic Parsing | 19 | 2.26% |
Abstractive Text Summarization | 18 | 2.15% |
Natural Language Inference | 16 | 1.91% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |