HellaSwag

21 papers with code • 1 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?

Datasets


Most implemented papers

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

allenai/dolma NA 2021

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world.

HellaSwag: Can a Machine Really Finish Your Sentence?

facebookresearch/text_characterization_toolkit ACL 2019

In this paper, we show that commonsense inference still proves difficult for even state-of-the-art models, by presenting HellaSwag, a new challenge dataset.

Training Compute-Optimal Large Language Models

karpathy/llama2.c 29 Mar 2022

We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget.

UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark

allenai/rainbow 24 Mar 2021

First, we propose a new multitask benchmark, RAINBOW, to promote research on commonsense models that generalize well over multiple tasks and datasets.

When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data Augmentation

huawei-noah/kd-nlp Findings (ACL) 2022

From a pre-generated pool of augmented samples, Glitter adaptively selects a subset of worst-case samples with maximal loss, analogous to adversarial DA.

Toward Adversarial Training on Contextualized Language Representation

gingasan/creat 8 May 2023

Based on the observation, we propose simple yet effective \textit{Contextualized representation-Adversarial Training} (CreAT), in which the attack is explicitly optimized to deviate the contextualized representation of the encoder.

In-Contextual Gender Bias Suppression for Large Language Models

livnlp/prompt_bias_suppression 13 Sep 2023

We show that, using CrowsPairs dataset, our textual preambles covering counterfactual statements can suppress gender biases in English LLMs such as LLaMA2.

An Open Source Data Contamination Report for Large Language Models

liyucheng09/contamination_detector 26 Oct 2023

We also introduce an open-source pipeline that enables the community to perform contamination analysis on customised data and models.

Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models

eternityyw/gemini-commonsense-evaluation 29 Dec 2023

To address this gap, our study undertakes a thorough evaluation of Gemini's performance in complex reasoning tasks that necessitate the integration of commonsense knowledge across modalities.

Attacks on Node Attributes in Graph Neural Networks

YingXu001/Attacks_on_Graph_Node_Attributes 19 Feb 2024

Graphs are commonly used to model complex networks prevalent in modern social media and literacy applications.