GPT-NeoX is an autoregressive transformer decoder model whose architecture largely follows that of GPT-3, with a few notable deviations. The model has 20 billion parameters with 44 layers, a hidden dimension size of 6144, and 64 heads. The main difference with GPT-3 is the change in tokenizer, the addition of Rotary Positional Embeddings, the parallel computation of attention and feed-forward layers, and a different initialization scheme and hyperparameters.
Source: GPT-NeoX-20B: An Open-Source Autoregressive Language ModelPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 3 | 37.50% |
Quantization | 1 | 12.50% |
Linguistic Acceptability | 1 | 12.50% |
Text Generation | 1 | 12.50% |
Text Detection | 1 | 12.50% |
Multi-task Language Understanding | 1 | 12.50% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |