The HHH dataset, also known as the Helpful, Honest, & Harmless (HHH) Alignment dataset, is a dataset used for evaluating language models. It is pragmatically broken down into the categories of helpfulness, honesty/accuracy, and harmlessness. The dataset is formatted in terms of binary comparisons, often broken down from a ranked ordering of three or four possible responses to a given query or context. The goal of these evaluations is that on careful reflection, the vast majority of people would agree that the chosen response is better (more helpful, honest, and harmless) than the alternative offered for comparison.
Paper | Code | Results | Date | Stars |
---|