Language Modelling

6987 papers with code • 55 benchmarks • 166 datasets

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Libraries

Use these libraries to find Language Modelling models and implementations
30 papers
146,919
12 papers
21,526
10 papers
31,622
See all 22 libraries.

Most implemented papers

A Neural Algorithm of Artistic Style

jcjohnson/neural-style 26 Aug 2015

In fine art, especially painting, humans have mastered the skill to create unique visual experiences through composing a complex interplay between the content and style of an image.

Semi-supervised Sequence Learning

tensorflow/models NeurIPS 2015

In our experiments, we find that long short term memory recurrent networks after being pretrained with the two approaches are more stable and generalize better.

LoRA: Low-Rank Adaptation of Large Language Models

microsoft/LoRA ICLR 2022

We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.

RoBERTa: A Robustly Optimized BERT Pretraining Approach

pytorch/fairseq 26 Jul 2019

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.

Language Models are Few-Shot Learners

openai/gpt-3 NeurIPS 2020

By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do.

Universal Language Model Fine-tuning for Text Classification

fastai/fastai ACL 2018

Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch.

Generating Sequences With Recurrent Neural Networks

karpathy/char-rnn 4 Aug 2013

This paper shows how Long Short-term Memory recurrent neural networks can be used to generate complex sequences with long-range structure, simply by predicting one data point at a time.

DARTS: Differentiable Architecture Search

quark0/darts ICLR 2019

This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner.

Deep contextualized word representations

flairNLP/flair NAACL 2018

We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e. g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i. e., to model polysemy).

Regularizing and Optimizing LSTM Language Models

salesforce/awd-lstm-lm ICLR 2018

Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks, including machine translation, language modeling, and question answering.