The LINUX dataset consists of 48,747 Program Dependence Graphs (PDG) generated from the Linux kernel. Each graph represents a function, where a node represents one statement and an edge represents the dependency between the two statements
13 PAPERS • NO BENCHMARKS YET
MalNet is a large public graph database, representing a large-scale ontology of software function call graphs. MalNet contains over 1.2 million graphs, averaging over 17k nodes and 39k edges per graph, across a hierarchy of 47 types and 696 families.
13 PAPERS • 4 BENCHMARKS
The Long Range Graph Benchmark (LRGB) is a collection of 5 graph learning datasets that arguably require long-range reasoning to achieve strong performance in a given task. The 5 datasets in this benchmark can be used to prototype new models that can capture long range dependencies in graphs. -|---| | PascalVOC-SP| Computer Vision | Node Classification | | COCO-SP | Computer Vision | Node Classification | | PCQM-Contact | Quantum Chemistry | Link Prediction | | Peptides-func | Chemistry | Graph Classification | | Peptides-struct | Chemistry | Graph Regression |
41 PAPERS • 5 BENCHMARKS
Reddit12k contains 11929 graphs each corresponding to an online discussion thread where nodes represent users, and an edge represents the fact that one of the two users responded to the comment of the There is 1 of 11 graph labels associated with each of these 11929 discussion graphs, representing the category of the community.
24 PAPERS • NO BENCHMARKS YET
…In particular, graphs are isomorphic if they have the same degree and the task is to classify non-isomorphic graphs.
29 PAPERS • 2 BENCHMARKS
…A graph corresponds to a researcher’s ego network, i.e., the researcher and its collaborators are nodes and an edge indicates collaboration between two researchers. The dataset has 5,000 graphs and each graph has label 0, 1, or 2.
230 PAPERS • 2 BENCHMARKS
Synthetic graph classification datasets with the task of recognizing the connectivity of same-colored nodes in 4 graphs of varying topology. The four Color-connectivity datasets were created by taking a graph and randomly coloring half of its nodes one color, e.g., red, and the other nodes blue, such that the red nodes either form a single For the underlying graph topology we used: 1) 16x16 2D grid, 2) 32x32 2D grid, 3) Euroroad road network (Šubelj et al. 2011), and 4) Minnesota road network. We sampled a balanced set of 15,000 coloring examples for each graph, except for Minnesota network for which we generated 6,000 examples due to memory constraints. The Color-connectivity task requires combination of local and long-range graph information processing to which most existing message-passing Graph Neural Networks (GNNs) do not scale.
1 PAPER • NO BENCHMARKS YET
AIDS is a graph dataset. It consists of 2000 graphs representing molecular compounds which are constructed from the AIDS Antiviral Screen Database of Active Compounds.
52 PAPERS • 1 BENCHMARK
REDDIT-BINARY consists of graphs corresponding to online discussions on Reddit. In each graph, nodes represent users, and there is an edge between them if at least one of them respond to the other’s comment. A graph is labeled according to whether it belongs to a question/answer-based community or a discussion-based community.
137 PAPERS • 2 BENCHMARKS
Tudataset: A collection of benchmark datasets for learning with graphs
68 PAPERS • 1 BENCHMARK
…In each graph, nodes represent actors/actress, and there is an edge between them if they appear in the same movie. These graphs are derived from the Action and Romance genres.
283 PAPERS • 2 BENCHMARKS
The Open Graph Benchmark (OGB) is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs.
813 PAPERS • 16 BENCHMARKS
The sampled 2-hop subgraphs centered on Exchange accounts on the Ethereum Interaction graph.
The sampled 2-hop subgraphs centered on Mining accounts on the Ethereum Interaction graph.
The sampled 2-hop subgraphs centered on ICO-wallet accounts on the Ethereum Interaction graph.
…The dataset has been integrated with Pytorch Geometric (PyG) and Deep Graph Library (DGL). You can load the dataset after installing the latest versions of PyG or DGL. The UPFD dataset includes two sets of tree-structured graphs curated for evaluating binary graph classification, graph anomaly detection, and fake/real news detection tasks. The news retweet graphs were originally extracted by FakeNewsNet. Each graph is a hierarchical tree-structured graph where the root node represents the news; the leaf nodes are Twitter users who retweeted the root news. The dataset statistics is shown below: | Data | #Graphs | #Fake News| #Total Nodes | #Total Edges | #Avg.
7 PAPERS • 2 BENCHMARKS
PTC is a collection of 344 chemical compounds represented as graphs which report the carcinogenicity for rats. There are 19 node labels for each node.
101 PAPERS • 1 BENCHMARK
…It's a synthetic dataset, which contains 1000 graphs divided into two classes according to the motif they contain: either a “house” or a five-node cycle.
39 PAPERS • 1 BENCHMARK
The Reddit dataset is a graph dataset from Reddit posts made in the month of September, 2014. The node label in this case is the community, or “subreddit”, that a post belongs to. 50 large communities have been sampled to build a post-to-post graph, connecting posts if the same user comments on
589 PAPERS • 13 BENCHMARKS
…The available screen results are chemical graph-structured data of these various compounds.
0 PAPER • NO BENCHMARKS YET
…Input graphs are used to represent chemical compounds, where vertices stand for atoms and are labeled by the atom type (represented by one-hot encoding), while edges between vertices represent bonds between
248 PAPERS • 3 BENCHMARKS
The NCI1 dataset comes from the cheminformatics domain, where each input graph is used as representation of a chemical compound: each vertex stands for an atom of the molecule, and edges between vertices
229 PAPERS • 2 BENCHMARKS