HELM (Holistic Evaluation of Language Models)

Introduced by Liang et al. in Holistic Evaluation of Language Models

The Holistic Evaluation of Language Models (HELM) is a comprehensive framework developed by Stanford University for evaluating foundation language models. It serves as a living benchmark, promoting transparency in language models. Here are the key aspects of HELM:

Purpose: HELM aims to provide a holistic view of language models by considering various dimensions and metrics. Coverage: It encompasses a wide range of scenarios and recognizes the inherent incompleteness of existing models. Metrics: HELM employs multiple metrics to assess language models. Standardization: The framework promotes standardization in evaluation practices. Accessibility: All data and analyses are freely accessible on the HELM website for exploration and study.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages