3 dataset results for Knowledge Probing AND English

PopQA is an open-domain QA dataset with 14k QA pairs with fine-grained Wikidata entity ID, Wikipedia page views, and relationship type information.

22 PAPERS • NO BENCHMARKS YET

BioLAMA

BioLAMA is a benchmark comprised of 49K biomedical factual knowledge triples for probing biomedical Language Models. It is used to assess the capabilities of Language Models for being valid biomedical knowledge bases.

13 PAPERS • 1 BENCHMARK

BEAR-big (Benchmark for Evaluating Associative Reasoning)

The $\text{BEAR}$ dataset and its larger version, $\text{BEAR}_{\text{big}}$, are benchmarks for evaluating common factual knowledge contained in language models.

1 PAPER • 1 BENCHMARK

Datasets

3 dataset results for Knowledge Probing AND English