Transformers

GPT-3

Introduced by Brown et al. in Language Models are Few-Shot Learners

GPT-3 is an autoregressive transformer model with 175 billion parameters. It uses the same architecture/model as GPT-2, including the modified initialization, pre-normalization, and reversible tokenization, with the exception that GPT-3 uses alternating dense and locally banded sparse attention patterns in the layers of the transformer, similar to the Sparse Transformer.

Source: Language Models are Few-Shot Learners

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Language Modelling 62 6.84%
Large Language Model 51 5.63%
Language Modeling 49 5.41%
Question Answering 48 5.30%
RAG 31 3.42%
Retrieval 29 3.20%
In-Context Learning 26 2.87%
Code Generation 26 2.87%
Few-Shot Learning 19 2.10%

Categories