Language Modelling

4482 papers with code • 51 benchmarks • 157 datasets

Language Modeling is the task of predicting the next word or character in a document. This technique can be used to train language models that can further be applied to a wide range of natural language tasks like text generation, text classification, and question answering.

Historically, language modelling was done with N-gram language models (which still have niche uses), but since the 2010s neural language models took over, and starting from the 2020s SOTA was achieved exclusively with large language models (LLMs).

A model's language modeling capability is measured using cross-entropy and perplexity. Some datasets to evaluate language modeling are WikiText-103, One Billion Word, Text8, C4, The Pile, among others.

Some notable state-of-the-art language models include:

GPT-3
BERT

Check below for all state-of-the-art models.

Here are some additional readings to go deeper on the task:

Language Modeling - Lena Voita

( Image credit: Exploring the Limits of Language Modeling )

Benchmarks

Add a Result

These leaderboards are used to track progress in Language Modelling

Dataset	Best Model	Compare
WikiText-103	RETRO (7.5B)	See all
Penn Treebank (Word Level)	GPT-3 (Zero-Shot)	See all
enwik8	GPT-2 (48 layers, h=1600)	See all
WikiText-2	SparseGPT (175B, 50% Sparsity)	See all
LAMBADA	PaLM-540B (Few-Shot)	See all
Text8	GPT-2	See all
One Billion Word	OmniNetT (Large)	See all
The Pile	GLM-130B	See all
Penn Treebank (Character Level)	Mogrifier LSTM + dynamic eval	See all
Hutter Prize	Transformer-XL + RMS dynamic eval	See all
C4	Primer	See all
Wiki-40B	FLASH-Quad-8k	See all
BIG-bench-lite	GLM-130B (3-shot)	See all
FewCLUE (EPRSTMT)	GLM-130B	See all
FewCLUE (OCNLI-FC)	GLM-130B	See all
FewCLUE (BUSTM)	GLM-130B	See all
FewCLUE (CHID-FC)	GLM-130B	See all
FewCLUE (CLUEWSC-FC)	GLM-130B	See all
CLUE (C3)	GLM-130B	See all
CLUE (WSC1.1)	GLM-130B	See all
CLUE (CMNLI)	GLM-130B	See all
CLUE (DRCD)	GLM-130B	See all
CLUE (OCNLI_50K)	GLM-130B	See all
CLUE (AFQMC)	GLM-130B	See all
CLUE (CMRC2018)	GLM-130B	See all
VietMed	Hybrid 4-gram VietMed-Train + ExtraText	See all
enwiki8	PAR Transformer 24B	See all
PTB Diagnostic ECG Database	I-DARTS	See all
Text8 dev	Transformer-LS (small)	See all
enwik8 dev	Transformer-LS (small)	See all
PubMed Cognitive Control Abstracts	Gopher	See all
DM Mathematics	Gopher	See all
Ubuntu IRC	Gopher	See all
OpenSubtitles	Gopher	See all
OpenWebtext2	Gopher	See all
HackerNews	Gopher	See all
Books3	Gopher	See all
Bookcorpus2	Gopher	See all
Pile CC	Gopher	See all
PhilPapers	Gopher	See all
Gutenberg PG-19	Gopher	See all
Arxiv HEP-TH citation graph	Gopher	See all
StackExchange	Gopher	See all
NIH ExPorter	Gopher	See all
USPTO Backgrounds	Gopher	See all
PubMed Central	Gopher	See all
FreeLaw	Gopher	See all
Curation Corpus	Gopher	See all
GitHub	Gopher	See all
100 sleep nights of 8 caregivers	Gpt3	See all
language-modeling-recommendation	GPT2	See all

Show all 51 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Language Modelling models and implementations

huggingface/transformers

30 papers

125,059

faceonlive/ai-research

29 papers

156

microsoft/unilm

12 papers

18,335

pytorch/fairseq

10 papers

29,255

See all 15 libraries.

Datasets

Subtasks

Sentence Pair Modeling

Cross-Document Language Modeling

Controllable Language Modelling

Latest papers

Most implemented Social Latest No code

CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies

salt-nlp/culturebank • • 23 Apr 2024

To enhance language models' cultural awareness, we design a generalizable pipeline to construct cultural knowledge bases from different online communities on a massive scale.

23 Apr 2024

Paper
Code

Setting up the Data Printer with Improved English to Ukrainian Machine Translation

lang-uk/dragoman • • 23 Apr 2024

To build large language models for Ukrainian we need to expand our corpora with large amounts of new algorithmic tasks expressed in natural language.

23 Apr 2024

Paper
Code

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

apple/corenet • • 22 Apr 2024

To this end, we release OpenELM, a state-of-the-art open language model.

2,776

22 Apr 2024

Paper
Code

How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

macaronlin/llama3-quantization • • 22 Apr 2024

This exploration holds the potential to unveil new insights and challenges for low-bit quantization of LLaMA3 and other forthcoming LLMs, especially in addressing performance degradation problems that suffer in LLM compression.

22 Apr 2024

Paper
Code

SpaceByte: Towards Deleting Tokenization from Large Language Modeling

kjslag/spacebyte • • 22 Apr 2024

Tokenization is widely used in large language models because it significantly improves performance.

22 Apr 2024

Paper
Code

VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models

haoyiq114/valor • 22 Apr 2024

To address these issues, we introduce a multi-dimensional benchmark covering objects, attributes, and relations, with challenging images selected based on associative biases.

22 Apr 2024

Paper
Code

Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels

janphilippfranken/sami • • 22 Apr 2024

On single-turn dialogue and summarization, a SAMI-trained mistral-7b outperforms the initial pretrained model, with win rates between 66% and 77%.

22 Apr 2024

Paper
Code

Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity

tyler-griggs/melange-release • 22 Apr 2024

Within this space, we show that there is not a linear relationship between GPU cost and performance, and identify three key LLM service characteristics that significantly affect which GPU type is the most cost effective: model request size, request rate, and latency service-level objective (SLO).

22 Apr 2024

Paper
Code

CoFInAl: Enhancing Action Quality Assessment with Coarse-to-Fine Instruction Alignment

zhoukanglei/cofinal_aqa • • 22 Apr 2024

However, this common strategy yields suboptimal results due to the inherent struggle of these backbones to capture the subtle cues essential for AQA.

22 Apr 2024

Paper
Code

A Survey on the Memory Mechanism of Large Language Model based Agents

nuster1128/llm_agent_memory_survey • 21 Apr 2024

Compared with original LLMs, LLM-based agents are featured in their self-evolving capability, which is the basis for solving real-world problems that need long-term and complex agent-environment interactions.

21 Apr 2024

Paper
Code

Language Modelling

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result