Language Modelling
4482 papers with code • 51 benchmarks • 157 datasets
Language Modeling is the task of predicting the next word or character in a document. This technique can be used to train language models that can further be applied to a wide range of natural language tasks like text generation, text classification, and question answering.
Historically, language modelling was done with N-gram language models (which still have niche uses), but since the 2010s neural language models took over, and starting from the 2020s SOTA was achieved exclusively with large language models (LLMs).
A model's language modeling capability is measured using cross-entropy and perplexity. Some datasets to evaluate language modeling are WikiText-103, One Billion Word, Text8, C4, The Pile, among others.
Some notable state-of-the-art language models include:
Check below for all state-of-the-art models.
Here are some additional readings to go deeper on the task:
- Language Modeling - Lena Voita
( Image credit: Exploring the Limits of Language Modeling )
Libraries
Use these libraries to find Language Modelling models and implementationsDatasets
Subtasks
Latest papers
CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies
To enhance language models' cultural awareness, we design a generalizable pipeline to construct cultural knowledge bases from different online communities on a massive scale.
Setting up the Data Printer with Improved English to Ukrainian Machine Translation
To build large language models for Ukrainian we need to expand our corpora with large amounts of new algorithmic tasks expressed in natural language.
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
To this end, we release OpenELM, a state-of-the-art open language model.
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
This exploration holds the potential to unveil new insights and challenges for low-bit quantization of LLaMA3 and other forthcoming LLMs, especially in addressing performance degradation problems that suffer in LLM compression.
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
Tokenization is widely used in large language models because it significantly improves performance.
VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models
To address these issues, we introduce a multi-dimensional benchmark covering objects, attributes, and relations, with challenging images selected based on associative biases.
Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels
On single-turn dialogue and summarization, a SAMI-trained mistral-7b outperforms the initial pretrained model, with win rates between 66% and 77%.
Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity
Within this space, we show that there is not a linear relationship between GPU cost and performance, and identify three key LLM service characteristics that significantly affect which GPU type is the most cost effective: model request size, request rate, and latency service-level objective (SLO).
CoFInAl: Enhancing Action Quality Assessment with Coarse-to-Fine Instruction Alignment
However, this common strategy yields suboptimal results due to the inherent struggle of these backbones to capture the subtle cues essential for AQA.
A Survey on the Memory Mechanism of Large Language Model based Agents
Compared with original LLMs, LLM-based agents are featured in their self-evolving capability, which is the basis for solving real-world problems that need long-term and complex agent-environment interactions.