LAMBADA

12 papers with code • 1 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?

Datasets


Most implemented papers

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

NVIDIA/Megatron-LM 17 Sep 2019

To demonstrate that large language models can further advance the state of the art (SOTA), we train an 8. 3 billion parameter transformer language model similar to GPT-2 and a 3. 9 billion parameter model similar to BERT.

Universal Transformers

tensorflow/tensor2tensor ICLR 2019

Feed-forward and convolutional architectures have recently been shown to achieve superior results on some sequence modeling tasks such as machine translation, with the added advantage that they concurrently process all inputs in the sequence, leading to easy parallelization and faster training times.

The LAMBADA dataset: Word prediction requiring a broad discourse context

keyonvafa/sequential-rationales ACL 2016

We introduce LAMBADA, a dataset to evaluate the capabilities of computational models for text understanding by means of a word prediction task.

Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences

LUMII-Syslab/RSE 6 Apr 2020

Attention is a commonly used mechanism in sequence processing, but it is of O(n^2) complexity which prevents its application to long sequences.

Training Compute-Optimal Large Language Models

karpathy/llama2.c 29 Mar 2022

We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget.

Entity Tracking Improves Cloze-style Reading Comprehension

harvardnlp/readcomp EMNLP 2018

Reading comprehension tasks test the ability of models to process long-term context and remember salient information.

Neural Shuffle-Exchange Networks -- Sequence Processing in O(n log n) Time

LUMII-Syslab/shuffle-exchange 18 Jul 2019

A key requirement in sequence to sequence processing is the modeling of long range dependencies.

Not Enough Data? Deep Learning to the Rescue!

makcedward/nlpaug 8 Nov 2019

Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks.

Neural Shuffle-Exchange Networks - Sequence Processing in O(n log n) Time

LUMII-Syslab/shuffle-exchange NeurIPS 2019

A key requirement in sequence to sequence processing is the modeling of long range dependencies.

The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models

microsoft/DeepSpeed 13 Aug 2021

To reduce the wall-clock training time, a common practice is to increase the batch size and learning rate.