PaLM (Pathways Language Model) uses a standard Transformer model architecture (Vaswani et al., 2017) in a decoder-only setup (i.e., each timestep can only attend to itself and past timesteps), with several modifications. PaLM is trained as a 540 billion parameter, densely activated, autoregressive Transformer on 780 billion tokens. PaLM leverages Pathways (Barham et al., 2022), which enables highly efficient training of very large neural networks across thousands of accelerator chips.
Image credit: PaLM: Scaling Language Modeling with PathwaysSource: PaLM: Scaling Language Modeling with Pathways
|🤖 No Components Found||You can add them if they exist; e.g. Mask R-CNN uses RoIAlign|