HellaSwag
21 papers with code • 1 benchmarks • 1 datasets
Benchmarks
These leaderboards are used to track progress in HellaSwag
Trend | Dataset | Best Model | Paper | Code | Compare |
---|
Most implemented papers
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world.
HellaSwag: Can a Machine Really Finish Your Sentence?
In this paper, we show that commonsense inference still proves difficult for even state-of-the-art models, by presenting HellaSwag, a new challenge dataset.
Training Compute-Optimal Large Language Models
We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget.
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark
First, we propose a new multitask benchmark, RAINBOW, to promote research on commonsense models that generalize well over multiple tasks and datasets.
When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data Augmentation
From a pre-generated pool of augmented samples, Glitter adaptively selects a subset of worst-case samples with maximal loss, analogous to adversarial DA.
Toward Adversarial Training on Contextualized Language Representation
Based on the observation, we propose simple yet effective \textit{Contextualized representation-Adversarial Training} (CreAT), in which the attack is explicitly optimized to deviate the contextualized representation of the encoder.
In-Contextual Gender Bias Suppression for Large Language Models
We show that, using CrowsPairs dataset, our textual preambles covering counterfactual statements can suppress gender biases in English LLMs such as LLaMA2.
An Open Source Data Contamination Report for Large Language Models
We also introduce an open-source pipeline that enables the community to perform contamination analysis on customised data and models.
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models
To address this gap, our study undertakes a thorough evaluation of Gemini's performance in complex reasoning tasks that necessitate the integration of commonsense knowledge across modalities.
Attacks on Node Attributes in Graph Neural Networks
Graphs are commonly used to model complex networks prevalent in modern social media and literacy applications.