HHH (Helpful, Honest, & Harmless)

Introduced by Askell et al. in A General Language Assistant as a Laboratory for Alignment

The HHH dataset, also known as the Helpful, Honest, & Harmless (HHH) Alignment dataset, is a dataset used for evaluating language models. It is pragmatically broken down into the categories of helpfulness, honesty/accuracy, and harmlessness. The dataset is formatted in terms of binary comparisons, often broken down from a ranked ordering of three or four possible responses to a given query or context. The goal of these evaluations is that on careful reflection, the vast majority of people would agree that the chosen response is better (more helpful, honest, and harmless) than the alternative offered for comparison.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages