FB15k-237 is a link prediction dataset created from FB15k. While FB15k consists of 1,345 relations, 14,951 entities, and 592,213 triples, many triples are inverses that cause leakage from the training to testing and validation splits. FB15k-237 was created by Toutanova and Chen (2015) to ensure that the testing and evaluation datasets do not have inverse relation test leakage. In summary, FB15k-237 dataset contains 310,116 triples with 14,541 entities and 237 relation types.
410 PAPERS • 3 BENCHMARKS
The Unified Medical Language System (UMLS) is a comprehensive resource that integrates and disseminates essential terminology, classification standards, and coding systems. Its purpose is to foster the creation of more effective and interoperable biomedical information systems and services, including electronic health records. Here are the key aspects of the UMLS:
18 PAPERS • 1 BENCHMARK
The IS-A dataset is a dataset of relations extracted from a medical ontology. The different entities in the ontology are related by the “is a” relation. For example, ‘acute leukemia’ is a ‘leukemia’. The dataset has 294,693 nodes with 356,541 edges between them.
4 PAPERS • NO BENCHMARKS YET
The Nations dataset is a small knowledge graph with 14 entities, 55 relations, and 1992 triples describing countries and their political relationships. This dataset is available for download from https://github.com/ZhenfengLei/KGDatasets.
2 PAPERS • NO BENCHMARKS YET
GO21 is a biomedical knowledge graph that models genes, proteins, drugs, and the hierarchy of the biological processes they participate in. It consists of 806,136 triples with 21 relations and 89127 entities. GO21 can be used for knowledge graph completion tasks (link prediction) as well as hierarchical reasoning tasks, such as ancestor-descendant prediction task proposed in the paper.
1 PAPER • 1 BENCHMARK
The dataset contains constructed multi-modal features (visual and textual), pseudo-labels (on heritage values and attributes), and graph structures (with temporal, social, and spatial links) constructed using User-Generated Content data collected from Flickr social media platform in three global cities containing UNESCO World Heritage property (Amsterdam, Suzhou, Venice). The motivation of data collection in this project is to provide datasets that could be both directly applicable for ML communities as test-bed, and theoretically informative for heritage and urban scholars to draw conclusions on for planning decision-making.
1 PAPER • NO BENCHMARKS YET
The PART-OF dataset is a dataset of relations extracted from a medical ontology. The different entities in the ontology are parts of the human body. The dataset has 16,894 nodes with 19,436 edges between them.