Winogrande

12 papers with code • 0 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

WinoGrande: An Adversarial Winograd Schema Challenge at Scale

vered1986/self_talk 24 Jul 2019

The key steps of the dataset construction consist of (1) a carefully designed crowdsourcing procedure, followed by (2) systematic bias reduction using a novel AfLite algorithm that generalizes human-detectable word associations to machine-detectable embedding associations.

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

allenai/dolma NA 2021

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world.

ST-MoE: Designing Stable and Transferable Sparse Expert Models

tensorflow/mesh 17 Feb 2022

But advancing the state-of-the-art across a broad set of natural language tasks has been hindered by training instabilities and uncertain quality during fine-tuning.

UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark

allenai/rainbow 24 Mar 2021

First, we propose a new multitask benchmark, RAINBOW, to promote research on commonsense models that generalize well over multiple tasks and datasets.

Few-Shot Out-of-Domain Transfer Learning of Natural Language Explanations in a Label-Abundant Setup

ydyordanov/few-shot-nles 12 Dec 2021

A potential solution is the few-shot out-of-domain transfer of NLEs from a parent task with many NLEs to a child task.

Are Hard Examples also Harder to Explain? A Study with Human and Model-Generated Explanations

swarnahub/explanationhardness 14 Nov 2022

We observe that (1) GPT-3 explanations are as grammatical as human explanations regardless of the hardness of the test samples, (2) for easy examples, GPT-3 generates highly supportive explanations but human explanations are more generalizable, and (3) for hard examples, human explanations are significantly better than GPT-3 explanations both in terms of label-supportiveness and generalizability judgements.

LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

optimalscale/lmflow 26 Mar 2024

Attempting to complement this deficiency, we investigate the layerwise properties of LoRA on fine-tuning tasks and observe an unexpected but consistent skewness of weight norms across different layers.

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

facebookresearch/layerskip 25 Apr 2024

We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs).

$\texttt{metabench}$ -- A Sparse Benchmark to Measure General Ability in Large Language Models

adkipnis/metabench 4 Jul 2024

Large Language Models (LLMs) vary in their abilities on a range of tasks.