Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on taskspecific datasets. We demonstrate that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText... (read more)
PDFIntroduced in the Paper:
Mentioned in the Paper:
Ranked #1 on
Language Modelling
on enwik8
(using extra training data)
TASK | DATASET | MODEL | METRIC NAME | METRIC VALUE | GLOBAL RANK | USES EXTRA TRAINING DATA |
BENCHMARK |
---|---|---|---|---|---|---|---|
Question Answering | Children's Book Test | GPT-2 | Accuracy-CN | 93.30% | # 1 | ||
Accuracy-NE | 89.05% | # 1 | |||||
Document Summarization | CNN / Daily Mail | GPT-2 | ROUGE-1 | 29.34 | # 18 | ||
ROUGE-2 | 8.27 | # 18 | |||||
ROUGE-L | 26.58 | # 18 | |||||
Language Modelling | enwik8 | GPT-2 (48 layers, h=1600) | Bit per Character (BPC) | 0.93 | # 1 | ||
Number of params | 1542M | # 1 | |||||
Multi-Task Learning | Hendrycks Test | GPT-2 | Accuracy (%) | 32.4 | # 3 | ||
Language Modelling | One Billion Word | GPT-2 | PPL | 42.16 | # 19 | ||
Number of params | 1.54B | # 1 | |||||
Language Modelling | Penn Treebank (Word Level) | GPT-2 | Test perplexity | 35.76 | # 3 | ||
Params | 1542M | # 2 | |||||
Language Modelling | Text8 | GPT-2 | Bit per Character (BPC) | 0.98 | # 1 | ||
Number of params | 1542M | # 1 | |||||
Language Modelling | The Pile | GPT-2 (Zero-Shot) | Bits per byte | 1.2253 | # 2 | ||
Language Modelling | WikiText-103 | GPT-2 Small | Test perplexity | 37.50 | # 42 | ||
Number of params | 124M | # 16 | |||||
Language Modelling | WikiText-103 | GPT-2 Medium | Test perplexity | 26.37 | # 28 | ||
Number of params | 355M | # 5 | |||||
Language Modelling | WikiText-103 | GPT-2 Large | Test perplexity | 22.05 | # 20 | ||
Number of params | 774M | # 3 | |||||
Language Modelling | WikiText-103 | GPT-2 Full | Test perplexity | 17.48 | # 8 | ||
Number of params | 1542M | # 2 | |||||
Language Modelling | WikiText-2 | GPT-2 | Test perplexity | 18.34 | # 1 | ||
Number of params | 1542M | # 1 |