The dataset contains constructed multi-modal features (visual and textual), pseudo-labels (on heritage values and attributes), and graph structures (with temporal, social, and spatial links) constructed
1 PAPER • NO BENCHMARKS YET
The Long Range Graph Benchmark (LRGB) is a collection of 5 graph learning datasets that arguably require long-range reasoning to achieve strong performance in a given task. The 5 datasets in this benchmark can be used to prototype new models that can capture long range dependencies in graphs. -|---| | PascalVOC-SP| Computer Vision | Node Classification | | COCO-SP | Computer Vision | Node Classification | | PCQM-Contact | Quantum Chemistry | Link Prediction | | Peptides-func | Chemistry | Graph Classification | | Peptides-struct | Chemistry | Graph Regression |
41 PAPERS • 5 BENCHMARKS
OGB Large-Scale Challenge (OGB-LSC) is a collection of three real-world datasets for advancing the state-of-the-art in large-scale graph ML. OGB-LSC provides graph datasets that are orders of magnitude larger than existing ones and covers three core graph learning tasks -- link prediction, graph regression, and node classification. MAG240M-LSC is a heterogeneous academic graph, and the task is to predict the subject areas of papers situated in the heterogeneous graph (node classification). WikiKG90M-LSC is a knowledge graph, and the task is to impute missing triplets (link prediction). PCQM4M-LSC is a quantum chemistry dataset, and the task is to predict an important molecular property, the HOMO-LUMO gap, of a given molecule (graph regression).
31 PAPERS • 3 BENCHMARKS
This webgraph is a page-page graph of verified Facebook sites. Nodes represent official Facebook pages while the links are mutual likes between sites. This graph was collected through the Facebook Graph API in November 2017 and restricted to pages from 4 categories which are defined by Facebook. This web graph is a page-page graph of verified Facebook sites. Nodes represent official Facebook pages while the links are mutual likes between sites. This graph was collected through the Facebook Graph API in November 2017 and restricted to pages from 4 categories that are defined by Facebook.
7 PAPERS • NO BENCHMARKS YET
protein roles—in terms of their cellular functions from gene ontology—in various protein-protein interaction (PPI) graphs, with each graph corresponding to a different human tissue [41]. positional gene The average graph contains 2373 nodes, with an average degree of 28.8.
285 PAPERS • 2 BENCHMARKS
minesweeper is a synthetic graph emulating the eponymous game.
15 PAPERS • 1 BENCHMARK
The Open Graph Benchmark (OGB) is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs.
813 PAPERS • 16 BENCHMARKS
Roman-empire is a word dependency graph based on the Roman Empire article from the English Wikipedia.
21 PAPERS • 1 BENCHMARK
Questions is an interaction graph of users of a question-answering website based on data provided by Yandex Q.
Placenta is a benchmark dataset for node classification in an underexplored domain: predicting microanatomical tissue structures from cell graphs in placenta histology whole slide images. Cell graphs are large (>1 million nodes per image), node features are varied (64-dimensions of 11 types of cells), class labels are imbalanced (9 classes ranging from 0.21% of the data to 40.0%), and cellular
2 PAPERS • 1 BENCHMARK
Amazon-Fraud is a multi-relational graph dataset built upon the Amazon review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models. Class=1)| |-------|--------| | 11,944 | 9.5 | | Relation | # Edges | |--------|--------| | U-P-U | 175,608 | | U-S-U | 3,566,479 | | U-V-U | 1,036,737 | | All | 4,398,392 | Graph We take users as nodes in the graph and design three relations: 1) U-P-U: it connects users reviewing at least one same product; 2) U-S-V: it connects users having at least one same star rating within
6 PAPERS • 2 BENCHMARKS
Yelp-Fraud is a multi-relational graph dataset built upon the Yelp spam review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models. Class=1) | |-------|--------| | 45,954 | 14.5 | | Relation | # Edges | |--------|--------| | R-U-R | 49,315 | | R-T-R | 573,616 | | R-S-R | 3,402,743 | | All | 3,846,979 | Graph Based on previous studies which show that opinion fraudsters have connections in user, product, review text, and time, we take reviews as nodes in the graph and design three relations: 1) R-U-R: it connects
10 PAPERS • 2 BENCHMARKS
The Reddit dataset is a graph dataset from Reddit posts made in the month of September, 2014. The node label in this case is the community, or “subreddit”, that a post belongs to. 50 large communities have been sampled to build a post-to-post graph, connecting posts if the same user comments on
589 PAPERS • 13 BENCHMARKS
AMZ Computers is a co-purchase graph extracted from Amazon, where nodes represent products, edges represent the co-purchased relations of products, and features are bag-of-words vectors extracted from
5 PAPERS • 1 BENCHMARK
…Although existing space function classifiers use space adjacency or connectivity graphs as input, the application of Graph Deep Learning (GDL) to space layout element classification has not been extensively To bridge this gap, we introduce a dataset named SAGC-A68, which comprises access graphs automatically generated from 68 digital 3D models of space layouts of apartment buildings designed or built between Each access graph contains nodes representing spaces and space elements and edges representing the connection between them.
MAG-Scholar-C is constructed by Bojchevski et al. based on Microsoft Academic Graph (MAG), in which nodes refer to papers, edges represent citation relations among papers and features are bag-of-words
4 PAPERS • NO BENCHMARKS YET
…PATTERN tests the fundamental graph task of recognizing specific predetermined subgraphs.
122 PAPERS • 1 BENCHMARK
…Input graphs are used to represent chemical compounds, where vertices stand for atoms and are labeled by the atom type (represented by one-hot encoding), while edges between vertices represent bonds between
248 PAPERS • 3 BENCHMARKS
…The citation data is extracted from DBLP, ACM, MAG (Microsoft Academic Graph), and other sources. The first version contains 629,814 papers and 632,752 citations.
205 PAPERS • 5 BENCHMARKS
…We construct a multi-relation graph based on the supplier, customer, shareholder, and financial information disclosed in the financial statements of Chinese companies.
1 PAPER • 1 BENCHMARK
…We here abstract the problem into a new benchmark for node classification in a geo-referenced graph. Solving it requires learning the spatial layout of the organ including symmetries.
Wiki-CS is a Wikipedia-based dataset for benchmarking Graph Neural Networks.
73 PAPERS • 2 BENCHMARKS
MuMiN is a misinformation graph dataset containing rich social media data (tweets, replies, users, images, articles, hashtags), spanning 21 million tweets belonging to 26 thousand Twitter threads, each
4 PAPERS • 3 BENCHMARKS