Causal Language Modeling

13 papers with code • 12 benchmarks • 4 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

CodeGen2: Lessons for Training LLMs on Programming and Natural Languages

salesforce/CodeGen 3 May 2023

In this study, we attempt to render the training of LLMs for program synthesis more efficient by unifying four key components: (1) model architectures, (2) learning methods, (3) infill sampling, and, (4) data distributions.

Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling

f90/Seq-U-Net 14 Nov 2019

In comparison to TCN and Wavenet, our network consistently saves memory and computation time, with speed-ups for training and inference of over 4x in the audio generation experiment in particular, while achieving a comparable performance in all tasks.

Prix-LM: Pretraining for Multilingual Knowledge Base Construction

luka-group/prix-lm ACL 2022

To achieve this, it is crucial to represent multilingual knowledge in a shared/unified space.

Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling

microsoft/CyBERTron-LM 25 May 2022

Previous works on sentence scoring mainly adopted either causal language modeling (CLM) like GPT or masked language modeling (MLM) like BERT, which have some limitations: 1) CLM only utilizes unidirectional information for the probability estimation of a sentence without considering bidirectional context, which affects the scoring quality; 2) MLM can only estimate the probability of partial tokens at a time and thus requires multiple forward passes to estimate the probability of the whole sentence, which incurs large computation and time cost.

Language Models are General-Purpose Interfaces

microsoft/unilm 13 Jun 2022

Experimental results across various language-only and vision-language benchmarks show that our model outperforms or is competitive with specialized models on finetuning, zero-shot generalization, and few-shot learning.

Self-Supervised Learning of Brain Dynamics from Broad Neuroimaging Data

athms/learning-from-brains 22 Jun 2022

At their core, these frameworks learn the dynamics of brain activity by modeling sequences of activity akin to how sequences of text are modeled in NLP.

AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

amazon-science/alexa-teacher-models 2 Aug 2022

In this work, we demonstrate that multilingual large-scale sequence-to-sequence (seq2seq) models, pre-trained on a mixture of denoising and Causal Language Modeling (CLM) tasks, are more efficient few-shot learners than decoder-only models on various tasks.

Suffix Retrieval-Augmented Language Modeling

victor-wang-902/surealm 6 Nov 2022

SUREALM employs an embedding retriever to search for training sentences in a data store that share similar word history during sequence generation.

Cross-lingual Similarity of Multilingual Representations Revisited

TartuNLP/xsim 4 Dec 2022

Related works used indexes like CKA and variants of CCA to measure the similarity of cross-lingual representations in multilingual language models.

Video Pre-trained Transformer: A Multimodal Mixture of Pre-trained Experts

KastanDay/video-pretrained-transformer 24 Mar 2023

Our backbone, based on a reference Flan-T5-11B architecture, learns a universal representation of the video that is a non-linear sum of the encoder models.