Causal Language Modeling
13 papers with code • 0 benchmarks • 4 datasets
Benchmarks
These leaderboards are used to track progress in Causal Language Modeling
Most implemented papers
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages
In this study, we attempt to render the training of LLMs for program synthesis more efficient by unifying four key components: (1) model architectures, (2) learning methods, (3) infill sampling, and, (4) data distributions.
Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling
In comparison to TCN and Wavenet, our network consistently saves memory and computation time, with speed-ups for training and inference of over 4x in the audio generation experiment in particular, while achieving a comparable performance in all tasks.
Prix-LM: Pretraining for Multilingual Knowledge Base Construction
To achieve this, it is crucial to represent multilingual knowledge in a shared/unified space.
Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling
Previous works on sentence scoring mainly adopted either causal language modeling (CLM) like GPT or masked language modeling (MLM) like BERT, which have some limitations: 1) CLM only utilizes unidirectional information for the probability estimation of a sentence without considering bidirectional context, which affects the scoring quality; 2) MLM can only estimate the probability of partial tokens at a time and thus requires multiple forward passes to estimate the probability of the whole sentence, which incurs large computation and time cost.
Language Models are General-Purpose Interfaces
Experimental results across various language-only and vision-language benchmarks show that our model outperforms or is competitive with specialized models on finetuning, zero-shot generalization, and few-shot learning.
Self-Supervised Learning of Brain Dynamics from Broad Neuroimaging Data
At their core, these frameworks learn the dynamics of brain activity by modeling sequences of activity akin to how sequences of text are modeled in NLP.
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
In this work, we demonstrate that multilingual large-scale sequence-to-sequence (seq2seq) models, pre-trained on a mixture of denoising and Causal Language Modeling (CLM) tasks, are more efficient few-shot learners than decoder-only models on various tasks.
Suffix Retrieval-Augmented Language Modeling
SUREALM employs an embedding retriever to search for training sentences in a data store that share similar word history during sequence generation.
Cross-lingual Similarity of Multilingual Representations Revisited
Related works used indexes like CKA and variants of CCA to measure the similarity of cross-lingual representations in multilingual language models.
Video Pre-trained Transformer: A Multimodal Mixture of Pre-trained Experts
Our backbone, based on a reference Flan-T5-11B architecture, learns a universal representation of the video that is a non-linear sum of the encoder models.