GPT

Introduced by Radford et al. in Improving Language Understanding by Generative Pre-Training

GPT is a Transformer-based architecture and training procedure for natural language processing tasks. Training follows a two-stage procedure. First, a language modeling objective is used on the unlabeled data to learn the initial parameters of a neural network model. Subsequently, these parameters are adapted to a target task using the corresponding supervised objective.

Source: Improving Language Understanding by Generative Pre-Training

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	84	10.62%
Large Language Model	50	6.32%
Question Answering	34	4.30%
Prompt Engineering	27	3.41%
Retrieval	23	2.91%
Text Generation	22	2.78%
In-Context Learning	21	2.65%
Decision Making	20	2.53%
Sentence	20	2.53%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Adam	Stochastic Optimization
Attention Dropout	Regularization
BPE	Subword Segmentation
Dense Connections	Feedforward Networks
Discriminative Fine-Tuning	Fine-Tuning
Dropout	Regularization
GELU	Activation Functions
Layer Normalization	Normalization
Linear Warmup With Cosine Annealing	Learning Rate Schedules
Multi-Head Attention	Attention Modules
Residual Connection	Skip Connections
Scaled Dot-Product Attention	Attention Mechanisms
Softmax	Output Functions
Weight Decay	Regularization

Categories

Add Remove

Transformers

Autoregressive Transformers

GPT

Papers

Tasks

Usage Over Time

Components

Categories Edit Add Remove

Categories

Add Remove