🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task (clear)

Filter by Language

22 dataset results for Graphs AND Graph Classification

The LINUX dataset consists of 48,747 Program Dependence Graphs (PDG) generated from the Linux kernel. Each graph represents a function, where a node represents one statement and an edge represents the dependency between the two statements

13 PAPERS • NO BENCHMARKS YET

MalNet

MalNet is a large public graph database, representing a large-scale ontology of software function call graphs. MalNet contains over 1.2 million graphs, averaging over 17k nodes and 39k edges per graph, across a hierarchy of 47 types and 696 families.

13 PAPERS • 4 BENCHMARKS

Long Range Graph Benchmark (LRGB)

The Long Range Graph Benchmark (LRGB) is a collection of 5 graph learning datasets that arguably require long-range reasoning to achieve strong performance in a given task. The 5 datasets in this benchmark can be used to prototype new models that can capture long range dependencies in graphs. -|---| | PascalVOC-SP| Computer Vision | Node Classification | | COCO-SP | Computer Vision | Node Classification | | PCQM-Contact | Quantum Chemistry | Link Prediction | | Peptides-func | Chemistry | Graph Classification | | Peptides-struct | Chemistry | Graph Regression |

41 PAPERS • 5 BENCHMARKS

REDDIT-12K

Reddit12k contains 11929 graphs each corresponding to an online discussion thread where nodes represent users, and an edge represents the fact that one of the two users responded to the comment of the There is 1 of 11 graph labels associated with each of these 11929 discussion graphs, representing the category of the community.

24 PAPERS • NO BENCHMARKS YET

CSL

…In particular, graphs are isomorphic if they have the same degree and the task is to classify non-isomorphic graphs.

29 PAPERS • 2 BENCHMARKS

COLLAB

…A graph corresponds to a researcher’s ego network, i.e., the researcher and its collaborators are nodes and an edge indicates collaboration between two researchers. The dataset has 5,000 graphs and each graph has label 0, 1, or 2.

230 PAPERS • 2 BENCHMARKS

Color-connectivity

Synthetic graph classification datasets with the task of recognizing the connectivity of same-colored nodes in 4 graphs of varying topology. The four Color-connectivity datasets were created by taking a graph and randomly coloring half of its nodes one color, e.g., red, and the other nodes blue, such that the red nodes either form a single For the underlying graph topology we used: 1) 16x16 2D grid, 2) 32x32 2D grid, 3) Euroroad road network (Šubelj et al. 2011), and 4) Minnesota road network. We sampled a balanced set of 15,000 coloring examples for each graph, except for Minnesota network for which we generated 6,000 examples due to memory constraints. The Color-connectivity task requires combination of local and long-range graph information processing to which most existing message-passing Graph Neural Networks (GNNs) do not scale.

1 PAPER • NO BENCHMARKS YET

AIDS

AIDS is a graph dataset. It consists of 2000 graphs representing molecular compounds which are constructed from the AIDS Antiviral Screen Database of Active Compounds.

52 PAPERS • 1 BENCHMARK

REDDIT-BINARY

REDDIT-BINARY consists of graphs corresponding to online discussions on Reddit. In each graph, nodes represent users, and there is an edge between them if at least one of them respond to the other’s comment. A graph is labeled according to whether it belongs to a question/answer-based community or a discussion-based community.

137 PAPERS • 2 BENCHMARKS

NCI109

Tudataset: A collection of benchmark datasets for learning with graphs

68 PAPERS • 1 BENCHMARK

IMDB-BINARY

…In each graph, nodes represent actors/actress, and there is an edge between them if they appear in the same movie. These graphs are derived from the Action and Romance genres.

283 PAPERS • 2 BENCHMARKS

OGB (Open Graph Benchmark)

The Open Graph Benchmark (OGB) is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs.

813 PAPERS • 16 BENCHMARKS

Eth-Exchange

Eth-Exchange (Exchange in Ethereum)

The sampled 2-hop subgraphs centered on Exchange accounts on the Ethereum Interaction graph.

1 PAPER • NO BENCHMARKS YET

Eth-Mining

Eth-Mining (Mining in Ethereum)

The sampled 2-hop subgraphs centered on Mining accounts on the Ethereum Interaction graph.

1 PAPER • NO BENCHMARKS YET

Eth-ICO

Eth-ICO (ICO-wallets in Ethereum)

The sampled 2-hop subgraphs centered on ICO-wallet accounts on the Ethereum Interaction graph.

1 PAPER • NO BENCHMARKS YET

UPFD (User Preference-aware Fake News Detection)

…The dataset has been integrated with Pytorch Geometric (PyG) and Deep Graph Library (DGL). You can load the dataset after installing the latest versions of PyG or DGL. The UPFD dataset includes two sets of tree-structured graphs curated for evaluating binary graph classification, graph anomaly detection, and fake/real news detection tasks. The news retweet graphs were originally extracted by FakeNewsNet. Each graph is a hierarchical tree-structured graph where the root node represents the news; the leaf nodes are Twitter users who retweeted the root news. The dataset statistics is shown below: | Data | #Graphs | #Fake News| #Total Nodes | #Total Edges | #Avg.

7 PAPERS • 2 BENCHMARKS

PTC

PTC (Predictive Toxicology Challenge)

PTC is a collection of 344 chemical compounds represented as graphs which report the carcinogenicity for rats. There are 19 node labels for each node.

101 PAPERS • 1 BENCHMARK

BA-2motifs

…It's a synthetic dataset, which contains 1000 graphs divided into two classes according to the motif they contain: either a “house” or a five-node cycle.

39 PAPERS • 1 BENCHMARK

The Reddit dataset is a graph dataset from Reddit posts made in the month of September, 2014. The node label in this case is the community, or “subreddit”, that a post belongs to. 50 large communities have been sampled to build a post-to-post graph, connecting posts if the same user comments on

589 PAPERS • 13 BENCHMARKS

AIDS Antiviral Screen

…The available screen results are chemical graph-structured data of these various compounds.

0 PAPER • NO BENCHMARKS YET

MUTAG

…Input graphs are used to represent chemical compounds, where vertices stand for atoms and are labeled by the atom type (represented by one-hot encoding), while edges between vertices represent bonds between

248 PAPERS • 3 BENCHMARKS

NCI1

The NCI1 dataset comes from the cheminformatics domain, where each input graph is used as representation of a chemical compound: each vertex stands for an atom of the molecule, and edges between vertices

229 PAPERS • 2 BENCHMARKS

Datasets

22 dataset results for Graphs AND Graph Classification