Language Modelling
4455 papers with code • 51 benchmarks • 157 datasets
Language Modeling is the task of predicting the next word or character in a document. This technique can be used to train language models that can further be applied to a wide range of natural language tasks like text generation, text classification, and question answering.
Historically, language modelling was done with N-gram language models (which still have niche uses), but since the 2010s neural language models took over, and starting from the 2020s SOTA was achieved exclusively with large language models (LLMs).
A model's language modeling capability is measured using cross-entropy and perplexity. Some datasets to evaluate language modeling are WikiText-103, One Billion Word, Text8, C4, The Pile, among others.
Some notable state-of-the-art language models include:
Check below for all state-of-the-art models.
Here are some additional readings to go deeper on the task:
- Language Modeling - Lena Voita
( Image credit: Exploring the Limits of Language Modeling )
Libraries
Use these libraries to find Language Modelling models and implementationsDatasets
Subtasks
Latest papers
Future Language Modeling from Temporal Document History
While there are many automated systems for predicting future numerical data, such as weather, stock prices, and demand for products, there is relatively little work in automatically predicting textual data.
Forcing Diffuse Distributions out of Language Models
Despite being trained specifically to follow user instructions, today's language models perform poorly when instructed to produce random outputs.
Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training
Our zero-shot evaluation results confirm the robustness of our approach across multiple tasks, including speech translation and multilingual spoken language understanding, thereby opening new avenues for applying LLMs in the speech domain.
Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models
Though diffusion models have been successfully applied to various image restoration (IR) tasks, their performance is sensitive to the choice of training datasets.
Compression Represents Intelligence Linearly
We open-source our compression datasets as well as our data collection pipelines to facilitate future researchers to assess compression properly.
Memory Sharing for Large Language Model based Agents
In the realm of artificial intelligence, the adaptation of Large Language Model (LLM)-based agents to execute tasks via natural language prompts represents a significant advancement, notably eliminating the need for explicit retraining or fine tuning for fixed-answer tasks such as common sense questions and yes/no queries.
in2IN: Leveraging individual Information to Generate Human INteractions
For this, we introduce in2IN, a novel diffusion model for human-human motion generation which is conditioned not only on the textual description of the overall interaction but also on the individual descriptions of the actions performed by each person involved in the interaction.
Knowledge-enhanced Visual-Language Pretraining for Computational Pathology
In this paper, we consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources, along with the domain specific knowledge in pathology.
A Self-feedback Knowledge Elicitation Approach for Chemical Reaction Predictions
The task of chemical reaction predictions (CRPs) plays a pivotal role in advancing drug discovery and material science.
LegalPro-BERT: Classification of Legal Provisions by fine-tuning BERT Large Language Model
Contract analysis requires the identification and classification of key provisions and paragraphs within an agreement.