GPT2-based Next Token Language Model

Model Name:*

Description with Markdown (optional):

# Summary

This is the public 117M parameter OpenAI GPT-2 Small language model for generating sentences. The model embeds some input tokens, contextualizes them, then predicts the next word, computing a loss against known target. 
If `BeamSearch` is given, this model will predict a sequence of next tokens.

[Explore live Language Modeling demo at AllenNLP](https://demo.allennlp.org/next-token-lm).

## How do I load this model?

```python
from allennlp_models.pretrained import load_predictor
predictor = load_predictor("lm-next-token-lm-gpt2")
```

### Getting predictions

```python
sentence = "In this example we are going to"
for i in range(30):
    preds = predictor.predict(sentence)
    sentence += preds["top_tokens"][0][0].replace("Ġ", " ").replace("Ċ", "\n")

# prints:
# In this example we are going to use the following code to create a new class called "Cookie".
# 
#
# public class Cookie { public static void main(String[] args)
```

## Citation

```bibtex
@inproceedings{Radford2019LanguageMA,
 author = {A. Radford and Jeffrey Wu and R. Child and David Luan and Dario Amodei and Ilya Sutskever},
 title = {Language Models are Unsupervised Multitask Learners},
 year = {2019}
}
```

Paper:*

Code URL (optional):

LR	0.01
Epochs	1

CONVOLUTION

BENCHMARK	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Children's Book Test	GPT2-based Next Token Language Model	Accuracy-CN	87.65	# 1
		Accuracy-NE	83.4	# 1

GPT2-based Next Token Language Model

allenai / allennlp

Summary

How do I load this model?

Getting predictions

Citation

Results

Language Modelling on WikiText-103

Language Modelling

Question Answering

Training Techniques	SGD
Architecture	Convolution, Dropout, Feedforward Network, GPT-2, Layer Normalization, Linear Layer
LR	0.01
Epochs	1
SHOW MORE
SHOW LESS

BENCHMARK	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
One Billion Word	GPT2-based Next Token Language Model	PPL	75.2	# 1
Penn Treebank (Word Level)	GPT2-based Next Token Language Model	Test perplexity	65.85	# 1
LAMBADA	GPT2-based Next Token Language Model	Accuracy	45.99	# 1
		Perplexity	35.13	# 1
WikiText-103	GPT2-based Next Token Language Model	Test perplexity	37.5	# 1
WikiText-2	GPT2-based Next Token Language Model	Test perplexity	29.41	# 1
Text8	GPT2-based Next Token Language Model	Bit per Character (BPC)	1.17	# 1
enwik8	GPT2-based Next Token Language Model	Bit per Character (BPC)	1.16	# 1