folktexts

Introduced by Cruz et al. in Evaluating language models as risk scores

A collection of natural language prompt-completion pairs pertaining to multiple-choice Q&A on benchmark tasks based on US census products. Benchmark tasks are made available through a python package dubbed folktexts. The main goal is to serve as a basis to evaluate LLMs' capabilities of uncertainty quantification on uncertain outcomes, i.e., evaluating quantification of aleatoric uncertainty. This is essentially a natural-language version of the popular folktables tabular data package.

Papers


Paper Code Results Date Stars

Tasks


License


Modalities


Languages