HELM (Holistic Evaluation of Language Models)

Introduced by Liang et al. in Holistic Evaluation of Language Models

The Holistic Evaluation of Language Models (HELM) is a comprehensive framework developed by Stanford University for evaluating foundation language models. It serves as a living benchmark, promoting transparency in language models. Here are the key aspects of HELM:

Purpose: HELM aims to provide a holistic view of language models by considering various dimensions and metrics. Coverage: It encompasses a wide range of scenarios and recognizes the inherent incompleteness of existing models. Metrics: HELM employs multiple metrics to assess language models. Standardization: The framework promotes standardization in evaluation practices. Accessibility: All data and analyses are freely accessible on the HELM website for exploration and study.

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

HELM (Holistic Evaluation of Language Models)

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

M3KE

BBQ

CValues

BIG-bench

Usage

License

Modalities

Languages

HELM (Holistic Evaluation of Language Models)

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

M3KE

BBQ

CValues

BIG-bench

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages