GPT-NeoX

Introduced by Black et al. in GPT-NeoX-20B: An Open-Source Autoregressive Language Model

GPT-NeoX is an autoregressive transformer decoder model whose architecture largely follows that of GPT-3, with a few notable deviations. The model has 20 billion parameters with 44 layers, a hidden dimension size of 6144, and 64 heads. The main difference with GPT-3 is the change in tokenizer, the addition of Rotary Positional Embeddings, the parallel computation of attention and feed-forward layers, and a different initialization scheme and hyperparameters.

Source: GPT-NeoX-20B: An Open-Source Autoregressive Language Model

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	3	37.50%
Quantization	1	12.50%
Linguistic Acceptability	1	12.50%
Text Generation	1	12.50%
Text Detection	1	12.50%
Multi-task Language Understanding	1	12.50%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Language Models