Provides detailed, graph-based annotations of social situations depicted in movie clips. Each graph consists of several types of nodes, to capture who is present in the clip, their emotional and physical attributes, their relationships (i.e., parent/child), and the interactions between them
12 PAPERS • NO BENCHMARKS YET
WikiGraphs is a dataset of Wikipedia articles each paired with a knowledge graph, to facilitate the research in conditional text generation, graph generation and graph representation learning. Existing graph-text paired datasets typically contain small graphs and short text (1 or few sentences), thus limiting the capabilities of the models that can be learned on the data. WikiGraphs is collected by pairing each Wikipedia article from the established WikiText-103 benchmark with a subgraph from the Freebase knowledge graph. Both the graphs and the text data are of significantly larger scale compared to prior graph-text paired datasets.
3 PAPERS • 1 BENCHMARK
SketchGraphs is a dataset of 15 million sketches extracted from real-world CAD models intended to facilitate research in both ML-aided design and geometric program induction. Each sketch is represented as a geometric constraint graph where edges denote designer-imposed geometric relationships between primitives, the nodes of the graph.
11 PAPERS • NO BENCHMARKS YET
https://doi.org/10.21227/gmd9-1534
2 PAPERS • NO BENCHMARKS YET
This dataset is a collection of undirected and unweighted LFR benchmark graphs as proposed by Lancichinetti et al. 1. For each configuration we provide 100 benchmark graphs. For each configuration we provide 50 benchmark graphs. For each configuration we provide 100 benchmark graphs. Benchmark graphs are given in edge list format. , Fortunato S, Radicchi F (2008) Benchmark graphs for testing community detection algorithms.
1 PAPER • NO BENCHMARKS YET
The LINUX dataset consists of 48,747 Program Dependence Graphs (PDG) generated from the Linux kernel. Each graph represents a function, where a node represents one statement and an edge represents the dependency between the two statements
13 PAPERS • NO BENCHMARKS YET
The ground truth betweenness-centralities for the real-world graphs are provided by AlGhamdi et al. (2017), which are computed by the parallel implementation of Brandes algorithm on a 96000-core supercomputer The ground truth scores for the synthetic networks are provided by Fan et al. (2019) and are computed using the graph-tool (Peixoto, 2014) library. DrBC (Fan et al., 2019): Shallow graph convolutional network that outputs a ranking score for each node by propagating through the neighbors with a walk length of 5.
The dataset contains constructed multi-modal features (visual and textual), pseudo-labels (on heritage values and attributes), and graph structures (with temporal, social, and spatial links) constructed
MalNet is a large public graph database, representing a large-scale ontology of software function call graphs. MalNet contains over 1.2 million graphs, averaging over 17k nodes and 39k edges per graph, across a hierarchy of 47 types and 696 families.
13 PAPERS • 4 BENCHMARKS
A special scene-graph for intelligent vehicles. Different to classical data representation, this graph provides not only object proposals but also their pair-wise relationships. By organizing them in a topological graph, these data are explainable, fully-connected, and could be easily processed by GCNs (Graph Convolutional Networks).
VesselGraph is a dataset of whole-brain vessel graphs based on specific imaging protocols. Specifically, vascular graphs are extracted using a refined graph extraction scheme leveraging the volume rendering engine Voreen and provided in an accessible and adaptable form through the OGB and PyTorch
GAP is a graph processing benchmark suite with the goal of helping to standardize graph processing evaluations. The benchmark not only specifies graph kernels, input graphs, and evaluation methodologies, but it also provides optimized baseline implementations. Graph framework developers can demonstrate the generality of their programming model by implementing all of the benchmark's kernels and delivering competitive performance on all of the benchmark's graphs Algorithm designers can use the input graphs and the baseline implementations to demonstrate their contribution. Platform designers and performance analysts can use the suite as a workload representative of graph processing.
48 PAPERS • 1 BENCHMARK
EventNarrative is a knowledge graph-to-text dataset from publicly available open-world knowledge graphs. EventNarrative consists of approximately 230,000 graphs and their corresponding natural language text.
2 PAPERS • 1 BENCHMARK
We release 280 synthetic IAM graphs generated using IAM graphs of commercial companies. Specifically, we vary the number of nodes, but keep graph density as is, i.e. in the range of 0.259 ± 0.198 (avg ± std). parameters across real graphs. After fixing node counts we sample with replacement the actual nodes from a real world graph, which is chosen at random. Then we add Gaussian N(0, 0.01) noise to node embeddings and renormalize them. A synthetic graph generated in such a way is an ”upsampled” version of an underlying real world graph.
The Long Range Graph Benchmark (LRGB) is a collection of 5 graph learning datasets that arguably require long-range reasoning to achieve strong performance in a given task. The 5 datasets in this benchmark can be used to prototype new models that can capture long range dependencies in graphs. -|---| | PascalVOC-SP| Computer Vision | Node Classification | | COCO-SP | Computer Vision | Node Classification | | PCQM-Contact | Quantum Chemistry | Link Prediction | | Peptides-func | Chemistry | Graph Classification | | Peptides-struct | Chemistry | Graph Regression |
47 PAPERS • 5 BENCHMARKS
GenWiki is a large-scale dataset for knowledge graph-to-text (G2T) and text-to-knowledge graph (T2G) conversion. It is introduced in the paper "GenWiki: A Dataset of 1.3 Million Content-Sharing Text and Graphs for Unsupervised Graph-to-Text Generation" by Zhijing Jin, Qipeng Guo, Xipeng Qiu, and Zheng Zhang at COLING
7 PAPERS • 2 BENCHMARKS
OGB Large-Scale Challenge (OGB-LSC) is a collection of three real-world datasets for advancing the state-of-the-art in large-scale graph ML. OGB-LSC provides graph datasets that are orders of magnitude larger than existing ones and covers three core graph learning tasks -- link prediction, graph regression, and node classification. MAG240M-LSC is a heterogeneous academic graph, and the task is to predict the subject areas of papers situated in the heterogeneous graph (node classification). WikiKG90M-LSC is a knowledge graph, and the task is to impute missing triplets (link prediction). PCQM4M-LSC is a quantum chemistry dataset, and the task is to predict an important molecular property, the HOMO-LUMO gap, of a given molecule (graph regression).
31 PAPERS • 3 BENCHMARKS
YoutubeGraph-Dyn is an evolving graph dataset generated from YouTube real-world interactions. It can be used to study temporal evolution on graphs. YoutubeGraph-Dyn provides intra-day time granularity (with 416 snapshots taken every 6 hours for a period of 104 days), multi-modal relationships that capture different aspects of the data, multiple attributes
Dataset Description: Summarized Wiki Articles with TTL Knowledge Graphs Overview This dataset comprises 500 summarized Wikipedia articles, each accompanied by a corresponding TTL knowledge graph. All articles and their associated knowledge graphs are consolidated into a single CSV file named wiki.csv, where each row represents one article. Dataset Files wiki.csv: CSV file containing all 500 summarized articles and their corresponding TTL knowledge graphs. all_ttl.txt: txt file containing all 500 knowledge graphs. Example Usage You can utilize this dataset for various natural language processing tasks, such as text summarization, knowledge graph construction, and information retrieval.
0 PAPER • NO BENCHMARKS YET
…Training graph contains 46K entities, 130 relations, 202K triples. Inference graph contains 30K entities, 130 relations, 77K triples. Validation and test triples to predict belong to the inference graph.
1 PAPER • 1 BENCHMARK
The Graphine dataset contains 2,010,648 terminology definition pairs organized in 227 directed acyclic graphs. Each node in the graph is associated with a terminology and its definition. Terminologies are organized from coarse-grained ones to fine-grained ones in each graph.
…Training graph contains 10K entities, 96 relations, 78K triples. Inference graph contains 7K entities, 96 relations, 21K triples. Validation and test triples to predict belong to the inference graph.
The KACC benchmark consists of three subtasks that can be applied to knowledge graphs: knowledge abstraction, knowledge concretization and knowledge completion. The knowledge abstraction subtask contains tasks of concept inference, schema prediction and concept graph completion on the two-view KG. The knowledge concretization subtask requires models to do entity graph completion based on the two subgraphs. The knowledge completion subtask consists of typical single-view knowledge graph completion tasks for each subgraph. KACC contains 999,902 entities in the entity graph, with 691 types of relations . The concept graph contains 21,293 concepts with 198 types of meta-relations. There are 2,367,971 cross-links between the two.
OCB contains two graph datasets, Ckt-Bench-101 and Ckt-Bench-301, for representation learning over analog circuits. Ckt-Bench-101 and Ckt-Bench-301 contain graphs (DAGs) that represent analog circuits and provide their corresponding graph-level properties: DC gain (Gain), bandwidth (BW), phase margin (PM),Figure of Tasks: graph-level prediction/regression; analog circuit search (ACS). First open source benchmark for graph learning in analog circuits.
SciGraphQA is a large-scale, open-domain dataset focused on generating multi-turn conversational question-answering dialogues centered around understanding and describing scientific graphs and figures. Each sample in ScFiGraphQA consists of a scientific graph image sourced from papers on ArXiv, accompanied by rich textual context including the paper's title, abstract, figure caption, and a paragraph The key motivation behind SciGraphQA is providing a large-scale resource to support research and development of multi-modal AI systems that can engage in informative, open-ended conversations about graphs Potential use cases of SciGraphQA include pre-training and benchmarking multi-modal conversational models for scientific graph comprehension, building AI assistants that can discuss data insights, and The academic source material also provides a way to evaluate model capabilities on expert-level graphs spanning diverse topics and complex visual encodings.
Reddit12k contains 11929 graphs each corresponding to an online discussion thread where nodes represent users, and an edge represents the fact that one of the two users responded to the comment of the There is 1 of 11 graph labels associated with each of these 11929 discussion graphs, representing the category of the community.
24 PAPERS • NO BENCHMARKS YET
Graph Robustness Benchmark (GRB) provides scalable, unified, modular, and reproducible evaluation on the adversarial robustness of graph machine learning models. GRB has elaborated datasets, unified evaluation pipeline, modular coding framework, and reproducible leaderboards, which facilitate the developments of graph adversarial learning, summarizing existing
4 PAPERS • NO BENCHMARKS YET
MMKG is a collection of three knowledge graphs for link prediction and entity matching research. Contrary to other knowledge graph datasets, these knowledge graphs contain both numerical features and images for all entities as well as entity alignments between pairs of KGs. The three knowledge graphs augmented with numerical features and images are called FB15k, YAGO15k, and DBPEDIA15k.
43 PAPERS • 5 BENCHMARKS
The Microsoft Academic Graph is a heterogeneous graph containing scientific publication records, citation relationships between those publications, as well as authors, institutions, journals, conferences
117 PAPERS • 1 BENCHMARK
CoDEx comprises a set of knowledge graph completion datasets extracted from Wikidata and Wikipedia that improve upon existing knowledge graph completion benchmarks in scope and level of difficulty. CoDEx comprises three knowledge graphs varying in size and structure, multilingual descriptions of entities and relations, and tens of thousands of hard negative triples that are plausible but verified
4 PAPERS • 1 BENCHMARK
This webgraph is a page-page graph of verified Facebook sites. Nodes represent official Facebook pages while the links are mutual likes between sites. This graph was collected through the Facebook Graph API in November 2017 and restricted to pages from 4 categories which are defined by Facebook. This web graph is a page-page graph of verified Facebook sites. Nodes represent official Facebook pages while the links are mutual likes between sites. This graph was collected through the Facebook Graph API in November 2017 and restricted to pages from 4 categories that are defined by Facebook.
7 PAPERS • NO BENCHMARKS YET
This dataset is originally created for the Knowledge Graph Reasoning Challenge for Social Issues (KGRC4SI) Video data that simulates daily life actions in a virtual space from Scenario Data. Knowledge graphs, and transcriptions of the Video Data content ("who" did what "action" with what "object," when and where, and the resulting "state" or "position" of the object). Knowledge Graph Embedding Data are created for reasoning based on machine learning
IMCPT-SparseGM dataset is a new visual graph matching benchmark addressing partial matching and graphs with larger sizes, based on the novel stereo benchmark Image Matching Challenge PhotoTourism (IMC-PT This dataset is released in CVPR 2023 paper Deep Learning of Partial Graph Matching via Differentiable Top-K.
Wikidata5m is a million-scale knowledge graph dataset with aligned corpus. This dataset integrates the Wikidata knowledge graph and Wikipedia pages. The dataset is distributed as a knowledge graph, a corpus, and aliases. We provide both transductive and inductive data splits used in the original paper.
46 PAPERS • 1 BENCHMARK
protein roles—in terms of their cellular functions from gene ontology—in various protein-protein interaction (PPI) graphs, with each graph corresponding to a different human tissue [41]. positional gene The average graph contains 2373 nodes, with an average degree of 28.8.
286 PAPERS • 2 BENCHMARKS
…In particular, graphs are isomorphic if they have the same degree and the task is to classify non-isomorphic graphs.
29 PAPERS • 2 BENCHMARKS
…Our representations range from simple graphs capturing character co-occurrence in single scenes to hypergraphs encoding complex communication settings and character contributions as hyperedges with edge-specific By making multiple intuitive representations readily available for experimentation, we facilitate rigorous representation robustness checks in graph learning, graph mining, and network analysis, highlighting
The WorldKG knowledge graph is a comprehensive large-scale geospatial knowledge graph based on OpenStreetMap that provides a semantic representation of geographic entities from over 188 countries. WorldKG contains a higher number of representations of geographic entities compared to other knowledge graphs and can be used as an underlying data source for various applications such as geospatial question
5 PAPERS • NO BENCHMARKS YET
ENT-DESC involves retrieving abundant knowledge of various types of main entities from a large knowledge graph (KG), which makes the current graph-to-sequence models severely suffer from the problems of
…A graph corresponds to a researcher’s ego network, i.e., the researcher and its collaborators are nodes and an edge indicates collaboration between two researchers. The dataset has 5,000 graphs and each graph has label 0, 1, or 2.
233 PAPERS • 2 BENCHMARKS
GraphQuestions is a characteristic-rich dataset designed for factoid question answering. Here are some key details about GraphQuestions: GraphQuestions consists of a set of 5,166 factoid questions. Each question is associated with logical forms and ground-truth answers.
13 PAPERS • 2 BENCHMARKS
Synthetic graph classification datasets with the task of recognizing the connectivity of same-colored nodes in 4 graphs of varying topology. The four Color-connectivity datasets were created by taking a graph and randomly coloring half of its nodes one color, e.g., red, and the other nodes blue, such that the red nodes either form a single For the underlying graph topology we used: 1) 16x16 2D grid, 2) 32x32 2D grid, 3) Euroroad road network (Šubelj et al. 2011), and 4) Minnesota road network. We sampled a balanced set of 15,000 coloring examples for each graph, except for Minnesota network for which we generated 6,000 examples due to memory constraints. The Color-connectivity task requires combination of local and long-range graph information processing to which most existing message-passing Graph Neural Networks (GNNs) do not scale.
A random sample from Pubmed Knowledge Graph.
The Human Phenotype Ontology (HPO) graph is a standardized vocabulary of human phenotypic abnormalities and their relationships. It represents these abnormalities as nodes in a graph, with edges indicating relationships such as subtypes or overlapping features. The HPO graph is organized in a hierarchical structure, with more general terms at the top and more specific terms at the bottom.
Arxiv HEP-TH (high energy physics theory) citation graph is from the e-print arXiv and covers all the citations within a dataset of 27,770 papers with 352,807 edges. If a paper i cites paper j, the graph contains a directed edge from i to j. If a paper cites, or is cited by, a paper outside the dataset, the graph does not contain any information about this.
34 PAPERS • 9 BENCHMARKS