Chinchilla is a 70B parameters model trained as a compute-optimal model with 1.4 trillion tokens. Findings suggest that these types of models are trained optimally by equally scaling both model size and training tokens. It uses the same compute budget as Gopher but with 4x more training data. Chinchilla and Gopher are trained for the same number of FLOPs. It is trained using MassiveText using a slightly modified SentencePiece tokenizer. More architectural details in the paper.
Source: Training Compute-Optimal Large Language ModelsPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 6 | 5.88% |
Question Answering | 4 | 3.92% |
Large Language Model | 3 | 2.94% |
Multi-task Language Understanding | 3 | 2.94% |
Anachronisms | 2 | 1.96% |
Common Sense Reasoning | 2 | 1.96% |
Mathematical Reasoning | 2 | 1.96% |
Multiple Choice Question Answering (MCQA) | 2 | 1.96% |
Word Sense Disambiguation | 2 | 1.96% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |