GPT2-based Next Token Language Model

Last updated on Mar 15, 2021

GPT2-based Next Token Language Model

Parameters 163 Million
File Size 578.30 MB
Training Data WebText

Training Techniques SGD
Architecture Convolution, Dropout, Feedforward Network, GPT-2, Layer Normalization, Linear Layer
LR 0.01
Epochs 1
SHOW MORE
SHOW LESS
README.md

Summary

This is the public 117M parameter OpenAI GPT-2 Small language model for generating sentences. The model embeds some input tokens, contextualizes them, then predicts the next word, computing a loss against known target. If BeamSearch is given, this model will predict a sequence of next tokens.

Explore live Language Modeling demo at AllenNLP.

How do I load this model?

from allennlp_models.pretrained import load_predictor
predictor = load_predictor("lm-next-token-lm-gpt2")

Getting predictions

sentence = "In this example we are going to"
for i in range(30):
    preds = predictor.predict(sentence)
    sentence += preds["top_tokens"][0][0].replace("Ġ", " ").replace("Ċ", "\n")

# prints:
# In this example we are going to use the following code to create a new class called "Cookie".
# 
#
# public class Cookie { public static void main(String[] args)

Citation

@inproceedings{Radford2019LanguageMA,
 author = {A. Radford and Jeffrey Wu and R. Child and David Luan and Dario Amodei and Ilya Sutskever},
 title = {Language Models are Unsupervised Multitask Learners},
 year = {2019}
}

Results

Language Modelling on WikiText-103

Language Modelling
BENCHMARK MODEL METRIC NAME METRIC VALUE GLOBAL RANK
One Billion Word GPT2-based Next Token Language Model PPL 75.2 # 1
Penn Treebank (Word Level) GPT2-based Next Token Language Model Test perplexity 65.85 # 1
LAMBADA GPT2-based Next Token Language Model Accuracy 45.99 # 1
Perplexity 35.13 # 1
WikiText-103 GPT2-based Next Token Language Model Test perplexity 37.5 # 1
WikiText-2 GPT2-based Next Token Language Model Test perplexity 29.41 # 1
Text8 GPT2-based Next Token Language Model Bit per Character (BPC) 1.17 # 1
enwik8 GPT2-based Next Token Language Model Bit per Character (BPC) 1.16 # 1
Question Answering
BENCHMARK MODEL METRIC NAME METRIC VALUE GLOBAL RANK
Children's Book Test GPT2-based Next Token Language Model Accuracy-CN 87.65 # 1
Accuracy-NE 83.4 # 1