2742 papers with code • 50 benchmarks • 142 datasets
Language Modeling is the task of predicting the next word or character in a document. This technique can be used to train language models that can further be applied to a wide range of natural language tasks like text generation, text classification, and question answering.
The common types of language modeling techniques involve:
- N-gram Language Models
- Neural Langauge Models
A model's language modeling capability is measured using cross-entropy and perplexity. Some datasets to evaluate language modeling are WikiText-103, One Billion Word, Text8, C4, among others.
One of the most recent popular benchmarks to evaluate language modeling capabilities is called SuperGLUE.
Some popular and notable state-of-the-art language models, include:
Check below for all state-of-the-art models.
Here are some additional readings to go deeper on the task:
- Language Modeling - Lena Voita
( Image credit: Exploring the Limits of Language Modeling )
These leaderboards are used to track progress in Language Modelling
LibrariesUse these libraries to find Language Modelling models and implementations
Most implemented papers
Semi-supervised Sequence Learning
In our experiments, we find that long short term memory recurrent networks after being pretrained with the two approaches are more stable and generalize better.
Universal Language Model Fine-tuning for Text Classification
Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch.
DARTS: Differentiable Architecture Search
This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner.
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.
Generating Sequences With Recurrent Neural Networks
This paper shows how Long Short-term Memory recurrent neural networks can be used to generate complex sequences with long-range structure, simply by predicting one data point at a time.
Deep contextualized word representations
We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e. g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i. e., to model polysemy).
Regularizing and Optimizing LSTM Language Models
Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks, including machine translation, language modeling, and question answering.
End-To-End Memory Networks
For the former our approach is competitive with Memory Networks, but with less supervision.
Listen, Attend and Spell
Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly.
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models
Recent developments in natural language representations have been accompanied by large and expensive models that leverage vast amounts of general-domain text through self-supervised pre-training.