Language Models


Introduced by Taylor et al. in Galactica: A Large Language Model for Science

Galactica is a language model which uses a Transformer architecture in a decoder-only setup with the following modifications:

  • It uses GeLU activations on all model sizes
  • It uses a 2048 length context window for all model sizes
  • It does not use biases in any of the dense kernels or layer norms
  • It uses learned positional embeddings for the model
  • A vocabulary of 50k tokens was constructed using BPE. The vocabulary was generated from a randomly selected 2% subset of the training data
Source: Galactica: A Large Language Model for Science


Paper Code Results Date Stars



Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign