Pythia is a suite of decoder-only autoregressive language models all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. The model architecture and hyperparameters largely follow GPT-3, with a few notable deviations based on recent advances in best practices for large scale language modeling.
Source: Pythia: A Suite for Analyzing Large Language Models Across Training and ScalingPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 22 | 25.29% |
Language Modeling | 14 | 16.09% |
Memorization | 7 | 8.05% |
Large Language Model | 5 | 5.75% |
Question Answering | 4 | 4.60% |
Text Generation | 3 | 3.45% |
Computational Efficiency | 2 | 2.30% |
LAMBADA | 2 | 2.30% |
Machine Translation | 2 | 2.30% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |