Language Modelling
4526 papers with code • 51 benchmarks • 157 datasets
Language Modeling is the task of predicting the next word or character in a document. This technique can be used to train language models that can further be applied to a wide range of natural language tasks like text generation, text classification, and question answering.
Historically, language modelling was done with N-gram language models (which still have niche uses), but since the 2010s neural language models took over, and starting from the 2020s SOTA was achieved exclusively with large language models (LLMs).
A model's language modeling capability is measured using cross-entropy and perplexity. Some datasets to evaluate language modeling are WikiText-103, One Billion Word, Text8, C4, The Pile, among others.
Some notable state-of-the-art language models include:
Check below for all state-of-the-art models.
Here are some additional readings to go deeper on the task:
- Language Modeling - Lena Voita
( Image credit: Exploring the Limits of Language Modeling )
Libraries
Use these libraries to find Language Modelling models and implementationsDatasets
Subtasks
Most implemented papers
Cross-lingual Language Model Pretraining
On unsupervised machine translation, we obtain 34. 3 BLEU on WMT'16 German-English, improving the previous state of the art by more than 9 BLEU.
The Curious Case of Neural Text Degeneration
Despite considerable advancements with deep neural language models, the enigma of neural text degeneration persists when these models are tested as text generators.
How to Fine-Tune BERT for Text Classification?
Language model pre-training has proven to be useful in learning universal language representations.
Decision Transformer: Reinforcement Learning via Sequence Modeling
In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling.
Self-Instruct: Aligning Language Models with Self-Generated Instructions
Applying our method to the vanilla GPT3, we demonstrate a 33% absolute improvement over the original model on Super-NaturalInstructions, on par with the performance of InstructGPT-001, which was trained with private user data and human annotations.
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
Recent results at the intersection of Bayesian modelling and deep learning offer a Bayesian interpretation of common deep learning techniques such as dropout.
Reformer: The Efficient Transformer
Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences.
Linformer: Self-Attention with Linear Complexity
Large transformer models have shown extraordinary success in achieving state-of-the-art results in many natural language processing applications.
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning.
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Existing methods for gaining such steerability collect human labels of the relative quality of model generations and fine-tune the unsupervised LM to align with these preferences, often with reinforcement learning from human feedback (RLHF).