Transformers

Pathways Language Model

Introduced by Chowdhery et al. in PaLM: Scaling Language Modeling with Pathways

PaLM (Pathways Language Model) uses a standard Transformer model architecture (Vaswani et al., 2017) in a decoder-only setup (i.e., each timestep can only attend to itself and past timesteps), with several modifications. PaLM is trained as a 540 billion parameter, densely activated, autoregressive Transformer on 780 billion tokens. PaLM leverages Pathways (Barham et al., 2022), which enables highly efficient training of very large neural networks across thousands of accelerator chips.

Image credit: PaLM: Scaling Language Modeling with Pathways

Source: PaLM: Scaling Language Modeling with Pathways

Papers


Paper Code Results Date Stars

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories