Language Modelling

4439 papers with code • 51 benchmarks • 157 datasets

Language Modeling is the task of predicting the next word or character in a document. This technique can be used to train language models that can further be applied to a wide range of natural language tasks like text generation, text classification, and question answering.

Historically, language modelling was done with N-gram language models (which still have niche uses), but since the 2010s neural language models took over, and starting from the 2020s SOTA was achieved exclusively with large language models (LLMs).

A model's language modeling capability is measured using cross-entropy and perplexity. Some datasets to evaluate language modeling are WikiText-103, One Billion Word, Text8, C4, The Pile, among others.

Some notable state-of-the-art language models include:

GPT-3
BERT

Check below for all state-of-the-art models.

Here are some additional readings to go deeper on the task:

Language Modeling - Lena Voita

( Image credit: Exploring the Limits of Language Modeling )

Benchmarks

Add a Result

These leaderboards are used to track progress in Language Modelling

Dataset	Best Model	Compare
WikiText-103	RETRO (7.5B)	See all
Penn Treebank (Word Level)	GPT-3 (Zero-Shot)	See all
enwik8	GPT-2 (48 layers, h=1600)	See all
WikiText-2	SparseGPT (175B, 50% Sparsity)	See all
LAMBADA	PaLM-540B (Few-Shot)	See all
Text8	GPT-2	See all
One Billion Word	OmniNetT (Large)	See all
The Pile	GLM-130B	See all
Penn Treebank (Character Level)	Mogrifier LSTM + dynamic eval	See all
Hutter Prize	Transformer-XL + RMS dynamic eval	See all
C4	Primer	See all
Wiki-40B	FLASH-Quad-8k	See all
BIG-bench-lite	GLM-130B (3-shot)	See all
FewCLUE (EPRSTMT)	GLM-130B	See all
FewCLUE (OCNLI-FC)	GLM-130B	See all
FewCLUE (BUSTM)	GLM-130B	See all
FewCLUE (CHID-FC)	GLM-130B	See all
FewCLUE (CLUEWSC-FC)	GLM-130B	See all
CLUE (C3)	GLM-130B	See all
CLUE (WSC1.1)	GLM-130B	See all
CLUE (CMNLI)	GLM-130B	See all
CLUE (DRCD)	GLM-130B	See all
CLUE (OCNLI_50K)	GLM-130B	See all
CLUE (AFQMC)	GLM-130B	See all
CLUE (CMRC2018)	GLM-130B	See all
VietMed	Hybrid 4-gram VietMed-Train + ExtraText	See all
enwiki8	PAR Transformer 24B	See all
PTB Diagnostic ECG Database	I-DARTS	See all
Text8 dev	Transformer-LS (small)	See all
enwik8 dev	Transformer-LS (small)	See all
PubMed Cognitive Control Abstracts	Gopher	See all
DM Mathematics	Gopher	See all
Ubuntu IRC	Gopher	See all
OpenSubtitles	Gopher	See all
OpenWebtext2	Gopher	See all
HackerNews	Gopher	See all
Books3	Gopher	See all
Bookcorpus2	Gopher	See all
Pile CC	Gopher	See all
PhilPapers	Gopher	See all
Gutenberg PG-19	Gopher	See all
Arxiv HEP-TH citation graph	Gopher	See all
StackExchange	Gopher	See all
NIH ExPorter	Gopher	See all
USPTO Backgrounds	Gopher	See all
PubMed Central	Gopher	See all
FreeLaw	Gopher	See all
Curation Corpus	Gopher	See all
GitHub	Gopher	See all
100 sleep nights of 8 caregivers	Gpt3	See all
language-modeling-recommendation	GPT2	See all

Show all 51 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Language Modelling models and implementations

huggingface/transformers

31 papers

124,593

faceonlive/ai-research

29 papers

131

microsoft/unilm

12 papers

18,284

pytorch/fairseq

10 papers

29,201

See all 15 libraries.

Datasets

Subtasks

Sentence Pair Modeling

Cross-Document Language Modeling

Controllable Language Modelling

Latest papers with no code

Most implemented Social Latest No code

Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding

no code yet • 17 Apr 2024

The dataset is created with GPT-4 to extend the abstract concept to a scene and concrete objects.

Paper
Add Code

LLMTune: Accelerate Database Knob Tuning with Large Language Models

no code yet • 17 Apr 2024

Database knob tuning is a critical challenge in the database community, aiming to optimize knob values to enhance database performance for specific workloads.

Paper
Add Code

Characterizing and modeling harms from interactions with design patterns in AI interfaces

no code yet • 17 Apr 2024

The proliferation of applications using artificial intelligence (AI) systems has led to a growing number of users interacting with these systems through sophisticated interfaces.

Paper
Add Code

Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives

no code yet • 17 Apr 2024

The Composed Image Retrieval (CIR) task aims to retrieve target images using a composed query consisting of a reference image and a modified text.

Paper
Add Code

RD2Bench: Toward Data-Centric Automatic R&D

no code yet • 17 Apr 2024

The progress of humanity is driven by those successful discoveries accompanied by countless failed experiments.

Paper
Add Code

Prompt-Guided Generation of Structured Chest X-Ray Report Using a Pre-trained LLM

no code yet • 17 Apr 2024

Our method introduces a prompt-guided approach to generate structured chest X-ray reports using a pre-trained large language model (LLM).

Paper
Add Code

A Progressive Framework of Vision-language Knowledge Distillation and Alignment for Multilingual Scene

no code yet • 17 Apr 2024

Pre-trained vision-language (V-L) models such as CLIP have shown excellent performance in many downstream cross-modal tasks.

Paper
Add Code

Lightweight Unsupervised Federated Learning with Pretrained Vision Language Model

no code yet • 17 Apr 2024

To address these two inherent challenges in supervised federated learning, we propose a novel lightweight unsupervised federated learning approach that leverages unlabeled data on each client to perform lightweight model training and communication by harnessing pretrained vision-language models, such as CLIP.

Paper
Add Code

On the Scalability of GNNs for Molecular Graphs

no code yet • 17 Apr 2024

However, structure-based architectures such as Graph Neural Networks (GNNs) are yet to show the benefits of scale mainly due to the lower efficiency of sparse operations, large data requirements, and lack of clarity about the effectiveness of various architectures.

Paper
Add Code

LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory

no code yet • 17 Apr 2024

Transformer models have been successful in various sequence processing tasks, but the self-attention mechanism's computational cost limits its practicality for long sequences.

Paper
Add Code

Language Modelling

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result