Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm

15 Feb 2021 Laria Reynolds Kyle McDonell

Prevailing methods for mapping large generative language models to supervised tasks may fail to sufficiently probe models' novel capabilities. Using GPT-3 as a case study, we show that 0-shot prompts can significantly outperform few-shot prompts... (read more)

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Cosine Annealing
Learning Rate Schedules
GELU
Activation Functions
Strided Attention
Attention Patterns
Attention Dropout
Regularization
Layer Normalization
Normalization
Fixed Factorized Attention
Attention Patterns
Dense Connections
Feedforward Networks
Adam
Stochastic Optimization
Linear Warmup With Cosine Annealing
Learning Rate Schedules
Scaled Dot-Product Attention
Attention Mechanisms
Softmax
Output Functions
BPE
Subword Segmentation
Dropout
Regularization
Residual Connection
Skip Connections
Weight Decay
Regularization
Multi-Head Attention
Attention Modules
GPT-3
Transformers