Language Models

Chinchilla

Introduced by Hoffmann et al. in Training Compute-Optimal Large Language Models

Chinchilla is a 70B parameters model trained as a compute-optimal model with 1.4 trillion tokens. Findings suggest that these types of models are trained optimally by equally scaling both model size and training tokens. It uses the same compute budget as Gopher but with 4x more training data. Chinchilla and Gopher are trained for the same number of FLOPs. It is trained using MassiveText using a slightly modified SentencePiece tokenizer. More architectural details in the paper.

Source: Training Compute-Optimal Large Language Models

Papers


Paper Code Results Date Stars

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories