PopQA is an open-domain QA dataset with 14k QA pairs with fine-grained Wikidata entity ID, Wikipedia page views, and relationship type information.
22 PAPERS • NO BENCHMARKS YET
BioLAMA is a benchmark comprised of 49K biomedical factual knowledge triples for probing biomedical Language Models. It is used to assess the capabilities of Language Models for being valid biomedical knowledge bases.
13 PAPERS • 1 BENCHMARK
The $\text{BEAR}$ dataset and its larger version, $\text{BEAR}_{\text{big}}$, are benchmarks for evaluating common factual knowledge contained in language models.
1 PAPER • 1 BENCHMARK