Causal Language Modeling

13 papers with code • 0 benchmarks • 4 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Causal Language Modeling

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Datasets

Most implemented papers

Most implemented Social Latest No code

CodeGen2: Lessons for Training LLMs on Programming and Natural Languages

salesforce/CodeGen • • 3 May 2023

In this study, we attempt to render the training of LLMs for program synthesis more efficient by unifying four key components: (1) model architectures, (2) learning methods, (3) infill sampling, and, (4) data distributions.

Paper
Code

Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling

f90/Seq-U-Net • • 14 Nov 2019

In comparison to TCN and Wavenet, our network consistently saves memory and computation time, with speed-ups for training and inference of over 4x in the audio generation experiment in particular, while achieving a comparable performance in all tasks.

Paper
Code

Prix-LM: Pretraining for Multilingual Knowledge Base Construction

luka-group/prix-lm • • ACL 2022

To achieve this, it is crucial to represent multilingual knowledge in a shared/unified space.

Paper
Code

Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling

microsoft/CyBERTron-LM • • 25 May 2022

Previous works on sentence scoring mainly adopted either causal language modeling (CLM) like GPT or masked language modeling (MLM) like BERT, which have some limitations: 1) CLM only utilizes unidirectional information for the probability estimation of a sentence without considering bidirectional context, which affects the scoring quality; 2) MLM can only estimate the probability of partial tokens at a time and thus requires multiple forward passes to estimate the probability of the whole sentence, which incurs large computation and time cost.

Paper
Code

Language Models are General-Purpose Interfaces

microsoft/unilm • • 13 Jun 2022

Experimental results across various language-only and vision-language benchmarks show that our model outperforms or is competitive with specialized models on finetuning, zero-shot generalization, and few-shot learning.

Paper
Code

Self-Supervised Learning of Brain Dynamics from Broad Neuroimaging Data

athms/learning-from-brains • • 22 Jun 2022

At their core, these frameworks learn the dynamics of brain activity by modeling sequences of activity akin to how sequences of text are modeled in NLP.

Paper
Code

AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

amazon-science/alexa-teacher-models • • 2 Aug 2022

In this work, we demonstrate that multilingual large-scale sequence-to-sequence (seq2seq) models, pre-trained on a mixture of denoising and Causal Language Modeling (CLM) tasks, are more efficient few-shot learners than decoder-only models on various tasks.

Paper
Code

Suffix Retrieval-Augmented Language Modeling

victor-wang-902/surealm • • 6 Nov 2022

SUREALM employs an embedding retriever to search for training sentences in a data store that share similar word history during sequence generation.

Paper
Code

Cross-lingual Similarity of Multilingual Representations Revisited

TartuNLP/xsim • • 4 Dec 2022

Related works used indexes like CKA and variants of CCA to measure the similarity of cross-lingual representations in multilingual language models.

Paper
Code

Video Pre-trained Transformer: A Multimodal Mixture of Pre-trained Experts

KastanDay/video-pretrained-transformer • • 24 Mar 2023

Our backbone, based on a reference Flan-T5-11B architecture, learns a universal representation of the video that is a non-linear sum of the encoder models.

Paper
Code

Causal Language Modeling

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result