Language Modelling

2260 papers with code • 37 benchmarks • 135 datasets

Language modeling is the task of predicting the next word or character in a document. This technique can be used to train language models that can further be applied to a wide range of natural language tasks like text generation, text classification, and question answering.

The common types of language modeling techniques involve:

  • N-gram Language Models
  • Neural Langauge Models

A model's language modeling capability is measured using cross-entropy and perplexity. Some datasets to evaluate language modeling are WikiText-103, One Billion Word, Text8, C4, among others.

One of the most recent popular benchmarks to evaluate language modeling capabilities is called SuperGLUE.

Some popular and notable state-of-the-art language models, include:

Check below for all state-of-the-art models.

Here are some additional readings to go deeper on the task:

( Image credit: Exploring the Limits of Language Modeling )

Libraries

Use these libraries to find Language Modelling models and implementations
10 papers
20,077
8 papers
7,397
6 papers
1,266
See all 8 libraries.

Most implemented papers

Semi-supervised Sequence Learning

tensorflow/models NeurIPS 2015

In our experiments, we find that long short term memory recurrent networks after being pretrained with the two approaches are more stable and generalize better.

Universal Language Model Fine-tuning for Text Classification

fastai/fastai ACL 2018

Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch.

DARTS: Differentiable Architecture Search

quark0/darts ICLR 2019

This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner.

Generating Sequences With Recurrent Neural Networks

karpathy/char-rnn 4 Aug 2013

This paper shows how Long Short-term Memory recurrent neural networks can be used to generate complex sequences with long-range structure, simply by predicting one data point at a time.

RoBERTa: A Robustly Optimized BERT Pretraining Approach

pytorch/fairseq 26 Jul 2019

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.

End-To-End Memory Networks

facebook/MemNN NeurIPS 2015

For the former our approach is competitive with Memory Networks, but with less supervision.

Regularizing and Optimizing LSTM Language Models

salesforce/awd-lstm-lm ICLR 2018

Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks, including machine translation, language modeling, and question answering.

Deep contextualized word representations

flairNLP/flair NAACL 2018

We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e. g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i. e., to model polysemy).

Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

google-research/bert ICLR 2020

Recent developments in natural language representations have been accompanied by large and expensive models that leverage vast amounts of general-domain text through self-supervised pre-training.

Listen, Attend and Spell

Alexander-H-Liu/End-to-end-ASR-Pytorch 5 Aug 2015

Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly.