Language Modelling

4526 papers with code • 51 benchmarks • 157 datasets

Language Modeling is the task of predicting the next word or character in a document. This technique can be used to train language models that can further be applied to a wide range of natural language tasks like text generation, text classification, and question answering.

Historically, language modelling was done with N-gram language models (which still have niche uses), but since the 2010s neural language models took over, and starting from the 2020s SOTA was achieved exclusively with large language models (LLMs).

A model's language modeling capability is measured using cross-entropy and perplexity. Some datasets to evaluate language modeling are WikiText-103, One Billion Word, Text8, C4, The Pile, among others.

Some notable state-of-the-art language models include:

Check below for all state-of-the-art models.

Here are some additional readings to go deeper on the task:

( Image credit: Exploring the Limits of Language Modeling )

Libraries

Use these libraries to find Language Modelling models and implementations
30 papers
125,478
12 papers
18,381
10 papers
29,305
See all 15 libraries.

Most implemented papers

Cross-lingual Language Model Pretraining

huggingface/transformers NeurIPS 2019

On unsupervised machine translation, we obtain 34. 3 BLEU on WMT'16 German-English, improving the previous state of the art by more than 9 BLEU.

The Curious Case of Neural Text Degeneration

ari-holtzman/degen ICLR 2020

Despite considerable advancements with deep neural language models, the enigma of neural text degeneration persists when these models are tested as text generators.

How to Fine-Tune BERT for Text Classification?

xuyige/BERT4doc-Classification 14 May 2019

Language model pre-training has proven to be useful in learning universal language representations.

Decision Transformer: Reinforcement Learning via Sequence Modeling

kzl/decision-transformer NeurIPS 2021

In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling.

Self-Instruct: Aligning Language Models with Self-Generated Instructions

tatsu-lab/stanford_alpaca 20 Dec 2022

Applying our method to the vanilla GPT3, we demonstrate a 33% absolute improvement over the original model on Super-NaturalInstructions, on par with the performance of InstructGPT-001, which was trained with private user data and human annotations.

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

HKUST-KnowComp/R-Net NeurIPS 2016

Recent results at the intersection of Bayesian modelling and deep learning offer a Bayesian interpretation of common deep learning techniques such as dropout.

Reformer: The Efficient Transformer

google/trax ICLR 2020

Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences.

Linformer: Self-Attention with Linear Complexity

facebookresearch/fairseq 8 Jun 2020

Large transformer models have shown extraordinary success in achieving state-of-the-art results in many natural language processing applications.

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

guidance-ai/guidance 28 Jan 2022

We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning.

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

huggingface/trl NeurIPS 2023

Existing methods for gaining such steerability collect human labels of the relative quality of model generations and fine-tune the unsupervised LM to align with these preferences, often with reinforcement learning from human feedback (RLHF).