GPT

Introduced by Radford et al. in Improving Language Understanding by Generative Pre-Training

GPT is a Transformer-based architecture and training procedure for natural language processing tasks. Training follows a two-stage procedure. First, a language modeling objective is used on the unlabeled data to learn the initial parameters of a neural network model. Subsequently, these parameters are adapted to a target task using the corresponding supervised objective.

Source: Improving Language Understanding by Generative Pre-Training

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	85	10.90%
Large Language Model	51	6.54%
Question Answering	33	4.23%
Prompt Engineering	27	3.46%
Text Generation	23	2.95%
Retrieval	21	2.69%
Sentence	20	2.56%
Decision Making	19	2.44%
In-Context Learning	18	2.31%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Adam	Stochastic Optimization
Attention Dropout	Regularization
BPE	Subword Segmentation
Dense Connections	Feedforward Networks
Discriminative Fine-Tuning	Fine-Tuning
Dropout	Regularization
GELU	Activation Functions
Layer Normalization	Normalization
Linear Warmup With Cosine Annealing	Learning Rate Schedules
Multi-Head Attention	Attention Modules
Residual Connection	Skip Connections
Scaled Dot-Product Attention	Attention Mechanisms
Softmax	Output Functions
Weight Decay	Regularization

Categories

Add Remove

Transformers

Autoregressive Transformers

GPT

Papers

Tasks

Usage Over Time

Components

Categories Edit Add Remove

Categories

Add Remove