🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task (clear)

Filter by Language

24 dataset results for Graphs AND Link Prediction

HeriGraph (Multimodal Machine Learning Datasets on Graphs of Heritage Values and Attributes)

The dataset contains constructed multi-modal features (visual and textual), pseudo-labels (on heritage values and attributes), and graph structures (with temporal, social, and spatial links) constructed

1 PAPER • NO BENCHMARKS YET

Long Range Graph Benchmark (LRGB)

The Long Range Graph Benchmark (LRGB) is a collection of 5 graph learning datasets that arguably require long-range reasoning to achieve strong performance in a given task. The 5 datasets in this benchmark can be used to prototype new models that can capture long range dependencies in graphs. -|---| | PascalVOC-SP| Computer Vision | Node Classification | | COCO-SP | Computer Vision | Node Classification | | PCQM-Contact | Quantum Chemistry | Link Prediction | | Peptides-func | Chemistry | Graph Classification | | Peptides-struct | Chemistry | Graph Regression |

49 PAPERS • 5 BENCHMARKS

OGB-LSC (OGB Large-Scale Challenge)

OGB Large-Scale Challenge (OGB-LSC) is a collection of three real-world datasets for advancing the state-of-the-art in large-scale graph ML. OGB-LSC provides graph datasets that are orders of magnitude larger than existing ones and covers three core graph learning tasks -- link prediction, graph regression, and node classification. MAG240M-LSC is a heterogeneous academic graph, and the task is to predict the subject areas of papers situated in the heterogeneous graph (node classification). WikiKG90M-LSC is a knowledge graph, and the task is to impute missing triplets (link prediction). PCQM4M-LSC is a quantum chemistry dataset, and the task is to predict an important molecular property, the HOMO-LUMO gap, of a given molecule (graph regression).

31 PAPERS • 3 BENCHMARKS

MMKG

MMKG is a collection of three knowledge graphs for link prediction and entity matching research. Contrary to other knowledge graph datasets, these knowledge graphs contain both numerical features and images for all entities as well as entity alignments between pairs of KGs. The three knowledge graphs augmented with numerical features and images are called FB15k, YAGO15k, and DBPEDIA15k.

44 PAPERS • 5 BENCHMARKS

CoDEx Medium

CoDEx comprises a set of knowledge graph completion datasets extracted from Wikidata and Wikipedia that improve upon existing knowledge graph completion benchmarks in scope and level of difficulty. CoDEx comprises three knowledge graphs varying in size and structure, multilingual descriptions of entities and relations, and tens of thousands of hard negative triples that are plausible but verified

4 PAPERS • 1 BENCHMARK

CoDEx Large

3 PAPERS • 1 BENCHMARK

CoDEx Small

3 PAPERS • 1 BENCHMARK

PPI

PPI (Protein-Protein Interactions (PPI))

protein roles—in terms of their cellular functions from gene ontology—in various protein-protein interaction (PPI) graphs, with each graph corresponding to a different human tissue [41]. positional gene The average graph contains 2373 nodes, with an average degree of 28.8.

286 PAPERS • 2 BENCHMARKS

Wikidata5M

Wikidata5m is a million-scale knowledge graph dataset with aligned corpus. This dataset integrates the Wikidata knowledge graph and Wikipedia pages. The dataset is distributed as a knowledge graph, a corpus, and aliases. We provide both transductive and inductive data splits used in the original paper.

46 PAPERS • 1 BENCHMARK

COLLAB

…A graph corresponds to a researcher’s ego network, i.e., the researcher and its collaborators are nodes and an edge indicates collaboration between two researchers. The dataset has 5,000 graphs and each graph has label 0, 1, or 2.

233 PAPERS • 2 BENCHMARKS

SNAP (Stanford Large Network Dataset Collection)

…It includes graphs representing social networks, citation networks, web graphs, online communities, online reviews and more. representing communication Citation networks : nodes represent papers, edges represent citations Collaboration networks : nodes represent scientists, edges represent collaborations (co-authoring a paper) Web graphs co-purchased products Internet networks : nodes represent computers and edges communication Road networks : nodes represent intersections and edges roads connecting the intersections Autonomous systems : graphs Face-to-face communication networks : networks of face-to-face (non-online) interactions Graph classification datasets : disjoint graphs from different classes

154 PAPERS • NO BENCHMARKS YET

KG20C

KG20C (A scholarly knowledge graph benchmark dataset)

KG20C is a Knowledge Graph about high quality papers from 20 top computer science Conferences. It can serve as a standard benchmark dataset in scholarly data analysis for several tasks, including knowledge graph embedding, link prediction, recommendation systems, and question answering .

4 PAPERS • 1 BENCHMARK

OGB (Open Graph Benchmark)

The Open Graph Benchmark (OGB) is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs.

841 PAPERS • 16 BENCHMARKS

GO21

GO21 is a biomedical knowledge graph that models genes, proteins, drugs, and the hierarchy of the biological processes they participate in. GO21 can be used for knowledge graph completion tasks (link prediction) as well as hierarchical reasoning tasks, such as ancestor-descendant prediction task proposed in the paper.

1 PAPER • 1 BENCHMARK

arXiv Astro-Ph

…If an author i co-authored a paper with author j, the graph contains a undirected edge from i to j. If the paper is co-authored by k authors this generates a completely connected (sub)graph on k nodes.

10 PAPERS • 2 BENCHMARKS

Arxiv GR-QC

Arxiv GR-QC (General Relativity and Quantum Cosmology collaboration network)

3 PAPERS • 2 BENCHMARKS

FB1.5M

The FB1.5M dataset is a benchmark for Knowledge Graph Completion. It is based on Freebase and it contains 30 relations with less than 500 triplets as low-resource relations.

1 PAPER • NO BENCHMARKS YET

Nations

The Nations dataset is a small knowledge graph with 14 entities, 55 relations, and 1992 triples describing countries and their political relationships.

2 PAPERS • NO BENCHMARKS YET

ACM

ACM (Association for Computing Machinery Active Contour Model algebraic collective model and-Compare Module Active Contour Models)

…An heterogeneous graph is constructed, which comprises 3025 papers, 5835 authors, and 56 subjects. Paper features correspond to elements of a bag-of-words represented of keywords.

4 PAPERS • 1 BENCHMARK

DBLP

DBLP (Citation Network Dataset)

…The citation data is extracted from DBLP, ACM, MAG (Microsoft Academic Graph), and other sources. The first version contains 629,814 papers and 632,752 citations.

208 PAPERS • 5 BENCHMARKS

Wiki-CS

Wiki-CS is a Wikipedia-based dataset for benchmarking Graph Neural Networks.

73 PAPERS • 2 BENCHMARKS

YAGO (Yet Another Great Ontology)

Yet Another Great Ontology (YAGO) is a Knowledge Graph that augments WordNet with common knowledge facts extracted from Wikipedia, converting WordNet from a primarily linguistic resource to a common knowledge

341 PAPERS • 7 BENCHMARKS

Decagon (Bio-decagon)

Bio-decagon is a dataset for polypharmacy side effect identification problem framed as a multirelational link prediction problem in a two-layer multimodal graph/network of two node types: drugs and proteins

31 PAPERS • 1 BENCHMARK

UMLS

UMLS (Unified Medical Language System)

…- National Library of Medicine. https://www.nlm.nih.gov/research/umls/index.html. (2) UMLS Metathesaurus Browser. https://uts.nlm.nih.gov/uts/umls/home. (3) GitHub - dongwookim-ml/kg-data: knowledge-graph

18 PAPERS • 1 BENCHMARK