Training Techniques | SGD |
---|---|
Architecture | Convolution, Dropout, Feedforward Network, GPT-2, Layer Normalization, Linear Layer |
LR | 0.01 |
SHOW MORE |
This is the public 117M parameter OpenAI GPT-2 Small language model for generating sentences. The model embeds some input tokens, contextualizes them, then predicts the next word, computing a loss against known target.
If BeamSearch
is given, this model will predict a sequence of next tokens.
Explore live Language Modeling demo at AllenNLP.
from allennlp_models.pretrained import load_predictor
predictor = load_predictor("lm-next-token-lm-gpt2")
sentence = "In this example we are going to"
for i in range(30):
preds = predictor.predict(sentence)
sentence += preds["top_tokens"][0][0].replace("Ġ", " ").replace("Ċ", "\n")
# prints:
# In this example we are going to use the following code to create a new class called "Cookie".
#
#
# public class Cookie { public static void main(String[] args)
@inproceedings{Radford2019LanguageMA,
author = {A. Radford and Jeffrey Wu and R. Child and David Luan and Dario Amodei and Ilya Sutskever},
title = {Language Models are Unsupervised Multitask Learners},
year = {2019}
}
BENCHMARK | MODEL | METRIC NAME | METRIC VALUE | GLOBAL RANK |
---|---|---|---|---|
One Billion Word | GPT2-based Next Token Language Model | PPL | 75.2 | # 1 |
Penn Treebank (Word Level) | GPT2-based Next Token Language Model | Test perplexity | 65.85 | # 1 |
LAMBADA | GPT2-based Next Token Language Model | Accuracy | 45.99 | # 1 |
Perplexity | 35.13 | # 1 | ||
WikiText-103 | GPT2-based Next Token Language Model | Test perplexity | 37.5 | # 1 |
WikiText-2 | GPT2-based Next Token Language Model | Test perplexity | 29.41 | # 1 |
Text8 | GPT2-based Next Token Language Model | Bit per Character (BPC) | 1.17 | # 1 |
enwik8 | GPT2-based Next Token Language Model | Bit per Character (BPC) | 1.16 | # 1 |
BENCHMARK | MODEL | METRIC NAME | METRIC VALUE | GLOBAL RANK |
---|---|---|---|---|
Children's Book Test | GPT2-based Next Token Language Model | Accuracy-CN | 87.65 | # 1 |
Accuracy-NE | 83.4 | # 1 |