Chinchilla

Introduced by Hoffmann et al. in Training Compute-Optimal Large Language Models

Chinchilla is a 70B parameters model trained as a compute-optimal model with 1.4 trillion tokens. Findings suggest that these types of models are trained optimally by equally scaling both model size and training tokens. It uses the same compute budget as Gopher but with 4x more training data. Chinchilla and Gopher are trained for the same number of FLOPs. It is trained using MassiveText using a slightly modified SentencePiece tokenizer. More architectural details in the paper.

Source: Training Compute-Optimal Large Language Models

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	6	5.88%
Question Answering	4	3.92%
Large Language Model	3	2.94%
Multi-task Language Understanding	3	2.94%
Anachronisms	2	1.96%
Common Sense Reasoning	2	1.96%
Mathematical Reasoning	2	1.96%
Multiple Choice Question Answering (MCQA)	2	1.96%
Word Sense Disambiguation	2	1.96%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Language Models