A collection of natural language prompt-completion pairs pertaining to multiple-choice Q&A on benchmark tasks based on US census products.
Benchmark tasks are made available through a python package dubbed folktexts
.
The main goal is to serve as a basis to evaluate LLMs' capabilities of uncertainty quantification on uncertain outcomes, i.e., evaluating quantification of aleatoric uncertainty.
This is essentially a natural-language version of the popular folktables tabular data package.
Paper | Code | Results | Date | Stars |
---|