🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task (clear)

Filter by Language

23 dataset results for Graphs AND Node Classification

HeriGraph (Multimodal Machine Learning Datasets on Graphs of Heritage Values and Attributes)

The dataset contains constructed multi-modal features (visual and textual), pseudo-labels (on heritage values and attributes), and graph structures (with temporal, social, and spatial links) constructed

1 PAPER • NO BENCHMARKS YET

Long Range Graph Benchmark (LRGB)

The Long Range Graph Benchmark (LRGB) is a collection of 5 graph learning datasets that arguably require long-range reasoning to achieve strong performance in a given task. The 5 datasets in this benchmark can be used to prototype new models that can capture long range dependencies in graphs. -|---| | PascalVOC-SP| Computer Vision | Node Classification | | COCO-SP | Computer Vision | Node Classification | | PCQM-Contact | Quantum Chemistry | Link Prediction | | Peptides-func | Chemistry | Graph Classification | | Peptides-struct | Chemistry | Graph Regression |

41 PAPERS • 5 BENCHMARKS

OGB-LSC (OGB Large-Scale Challenge)

OGB Large-Scale Challenge (OGB-LSC) is a collection of three real-world datasets for advancing the state-of-the-art in large-scale graph ML. OGB-LSC provides graph datasets that are orders of magnitude larger than existing ones and covers three core graph learning tasks -- link prediction, graph regression, and node classification. MAG240M-LSC is a heterogeneous academic graph, and the task is to predict the subject areas of papers situated in the heterogeneous graph (node classification). WikiKG90M-LSC is a knowledge graph, and the task is to impute missing triplets (link prediction). PCQM4M-LSC is a quantum chemistry dataset, and the task is to predict an important molecular property, the HOMO-LUMO gap, of a given molecule (graph regression).

31 PAPERS • 3 BENCHMARKS

Facebook Page-Page

This webgraph is a page-page graph of verified Facebook sites. Nodes represent official Facebook pages while the links are mutual likes between sites. This graph was collected through the Facebook Graph API in November 2017 and restricted to pages from 4 categories which are defined by Facebook. This web graph is a page-page graph of verified Facebook sites. Nodes represent official Facebook pages while the links are mutual likes between sites. This graph was collected through the Facebook Graph API in November 2017 and restricted to pages from 4 categories that are defined by Facebook.

7 PAPERS • NO BENCHMARKS YET

PPI

PPI (Protein-Protein Interactions (PPI))

protein roles—in terms of their cellular functions from gene ontology—in various protein-protein interaction (PPI) graphs, with each graph corresponding to a different human tissue [41]. positional gene The average graph contains 2373 nodes, with an average degree of 28.8.

285 PAPERS • 2 BENCHMARKS

minesweeper

minesweeper is a synthetic graph emulating the eponymous game.

15 PAPERS • 1 BENCHMARK

OGB (Open Graph Benchmark)

The Open Graph Benchmark (OGB) is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs.

813 PAPERS • 16 BENCHMARKS

roman-empire

Roman-empire is a word dependency graph based on the Roman Empire article from the English Wikipedia.

21 PAPERS • 1 BENCHMARK

questions

Questions is an interaction graph of users of a question-answering website based on data provided by Yandex Q.

21 PAPERS • 1 BENCHMARK

Placenta

Placenta is a benchmark dataset for node classification in an underexplored domain: predicting microanatomical tissue structures from cell graphs in placenta histology whole slide images. Cell graphs are large (>1 million nodes per image), node features are varied (64-dimensions of 11 types of cells), class labels are imbalanced (9 classes ranging from 0.21% of the data to 40.0%), and cellular

2 PAPERS • 1 BENCHMARK

Amazon-Fraud (Multi-relational Graph Dataset for Amazon Fraudulent Account Detection)

Amazon-Fraud is a multi-relational graph dataset built upon the Amazon review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models. Class=1)| |-------|--------| | 11,944 | 9.5 | | Relation | # Edges | |--------|--------| | U-P-U | 175,608 | | U-S-U | 3,566,479 | | U-V-U | 1,036,737 | | All | 4,398,392 | Graph We take users as nodes in the graph and design three relations: 1) U-P-U: it connects users reviewing at least one same product; 2) U-S-V: it connects users having at least one same star rating within

6 PAPERS • 2 BENCHMARKS

Yelp-Fraud (Multi-relational Graph Dataset for Yelp Spam Review Detection)

Yelp-Fraud is a multi-relational graph dataset built upon the Yelp spam review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models. Class=1) | |-------|--------| | 45,954 | 14.5 | | Relation | # Edges | |--------|--------| | R-U-R | 49,315 | | R-T-R | 573,616 | | R-S-R | 3,402,743 | | All | 3,846,979 | Graph Based on previous studies which show that opinion fraudsters have connections in user, product, review text, and time, we take reviews as nodes in the graph and design three relations: 1) R-U-R: it connects

10 PAPERS • 2 BENCHMARKS

The Reddit dataset is a graph dataset from Reddit posts made in the month of September, 2014. The node label in this case is the community, or “subreddit”, that a post belongs to. 50 large communities have been sampled to build a post-to-post graph, connecting posts if the same user comments on

589 PAPERS • 13 BENCHMARKS

AMZ Computers

AMZ Computers (amazon_electronics_computers)

AMZ Computers is a co-purchase graph extracted from Amazon, where nodes represent products, edges represent the co-purchased relations of products, and features are bag-of-words vectors extracted from

5 PAPERS • 1 BENCHMARK

SAGC-A68

SAGC-A68 (A space access graph dataset for the classification of spaces and space elements in apartment buildings)

…Although existing space function classifiers use space adjacency or connectivity graphs as input, the application of Graph Deep Learning (GDL) to space layout element classification has not been extensively To bridge this gap, we introduce a dataset named SAGC-A68, which comprises access graphs automatically generated from 68 digital 3D models of space layouts of apartment buildings designed or built between Each access graph contains nodes representing spaces and space elements and edges representing the connection between them.

1 PAPER • NO BENCHMARKS YET

MAG-Scholar-C

MAG-Scholar-C is constructed by Bojchevski et al. based on Microsoft Academic Graph (MAG), in which nodes refer to papers, edges represent citation relations among papers and features are bag-of-words

4 PAPERS • NO BENCHMARKS YET

PATTERN

…PATTERN tests the fundamental graph task of recognizing specific predetermined subgraphs.

122 PAPERS • 1 BENCHMARK

MUTAG

…Input graphs are used to represent chemical compounds, where vertices stand for atoms and are labeled by the atom type (represented by one-hot encoding), while edges between vertices represent bonds between

248 PAPERS • 3 BENCHMARKS

DBLP

DBLP (Citation Network Dataset)

…The citation data is extracted from DBLP, ACM, MAG (Microsoft Academic Graph), and other sources. The first version contains 629,814 papers and 632,752 citations.

205 PAPERS • 5 BENCHMARKS

FDCompCN

…We construct a multi-relation graph based on the supplier, customer, shareholder, and financial information disclosed in the financial statements of Chinese companies.

1 PAPER • 1 BENCHMARK

CellTypeGraph Benchmark

…We here abstract the problem into a new benchmark for node classification in a geo-referenced graph. Solving it requires learning the spatial layout of the organ including symmetries.

1 PAPER • 1 BENCHMARK

Wiki-CS

Wiki-CS is a Wikipedia-based dataset for benchmarking Graph Neural Networks.

73 PAPERS • 2 BENCHMARKS

MuMiN

MuMiN is a misinformation graph dataset containing rich social media data (tweets, replies, users, images, articles, hashtags), spanning 21 million tweets belonging to 26 thousand Twitter threads, each

4 PAPERS • 3 BENCHMARKS