PaLM (Pathways Language Model) uses a standard Transformer model architecture (Vaswani et al., 2017) in a decoder-only setup (i.e., each timestep can only attend to itself and past timesteps), with several modifications. PaLM is trained as a 540 billion parameter, densely activated, autoregressive Transformer on 780 billion tokens. PaLM leverages Pathways (Barham et al., 2022), which enables highly efficient training of very large neural networks across thousands of accelerator chips.
Image credit: PaLM: Scaling Language Modeling with Pathways
Source: PaLM: Scaling Language Modeling with PathwaysPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 24 | 9.23% |
Question Answering | 16 | 6.15% |
In-Context Learning | 10 | 3.85% |
Large Language Model | 9 | 3.46% |
Retrieval | 6 | 2.31% |
GSM8K | 6 | 2.31% |
Multi-task Language Understanding | 6 | 2.31% |
Few-Shot Learning | 6 | 2.31% |
Common Sense Reasoning | 6 | 2.31% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |