🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task

Filter by Language (clear)

108 dataset results for Graphs AND English

WikiGraphs is a dataset of Wikipedia articles each paired with a knowledge graph, to facilitate the research in conditional text generation, graph generation and graph representation learning. Existing graph-text paired datasets typically contain small graphs and short text (1 or few sentences), thus limiting the capabilities of the models that can be learned on the data. WikiGraphs is collected by pairing each Wikipedia article from the established WikiText-103 benchmark with a subgraph from the Freebase knowledge graph. Both the graphs and the text data are of significantly larger scale compared to prior graph-text paired datasets.

3 PAPERS • 1 BENCHMARK

HeriGraph (Multimodal Machine Learning Datasets on Graphs of Heritage Values and Attributes)

The dataset contains constructed multi-modal features (visual and textual), pseudo-labels (on heritage values and attributes), and graph structures (with temporal, social, and spatial links) constructed

1 PAPER • NO BENCHMARKS YET

KG dataset

Dataset Description: Summarized Wiki Articles with TTL Knowledge Graphs Overview This dataset comprises 500 summarized Wikipedia articles, each accompanied by a corresponding TTL knowledge graph. All articles and their associated knowledge graphs are consolidated into a single CSV file named wiki.csv, where each row represents one article. Dataset Files wiki.csv: CSV file containing all 500 summarized articles and their corresponding TTL knowledge graphs. all_ttl.txt: txt file containing all 500 knowledge graphs. Example Usage You can utilize this dataset for various natural language processing tasks, such as text summarization, knowledge graph construction, and information retrieval.

0 PAPER • NO BENCHMARKS YET

ILPC22-Large

…Training graph contains 46K entities, 130 relations, 202K triples. Inference graph contains 30K entities, 130 relations, 77K triples. Validation and test triples to predict belong to the inference graph.

1 PAPER • 1 BENCHMARK

ILPC22-Small

…Training graph contains 10K entities, 96 relations, 78K triples. Inference graph contains 7K entities, 96 relations, 21K triples. Validation and test triples to predict belong to the inference graph.

1 PAPER • 1 BENCHMARK

OCB (Open Circuit Benchmark)

OCB contains two graph datasets, Ckt-Bench-101 and Ckt-Bench-301, for representation learning over analog circuits. Ckt-Bench-101 and Ckt-Bench-301 contain graphs (DAGs) that represent analog circuits and provide their corresponding graph-level properties: DC gain (Gain), bandwidth (BW), phase margin (PM),Figure of Tasks: graph-level prediction/regression; analog circuit search (ACS). First open source benchmark for graph learning in analog circuits.

1 PAPER • NO BENCHMARKS YET

SciGraphQA

SciGraphQA is a large-scale, open-domain dataset focused on generating multi-turn conversational question-answering dialogues centered around understanding and describing scientific graphs and figures. Each sample in ScFiGraphQA consists of a scientific graph image sourced from papers on ArXiv, accompanied by rich textual context including the paper's title, abstract, figure caption, and a paragraph The key motivation behind SciGraphQA is providing a large-scale resource to support research and development of multi-modal AI systems that can engage in informative, open-ended conversations about graphs Potential use cases of SciGraphQA include pre-training and benchmarking multi-modal conversational models for scientific graph comprehension, building AI assistants that can discuss data insights, and The academic source material also provides a way to evaluate model capabilities on expert-level graphs spanning diverse topics and complex visual encodings.

3 PAPERS • 1 BENCHMARK

Microsoft Academic Graph

The Microsoft Academic Graph is a heterogeneous graph containing scientific publication records, citation relationships between those publications, as well as authors, institutions, journals, conferences

116 PAPERS • 1 BENCHMARK

Facebook Page-Page

This webgraph is a page-page graph of verified Facebook sites. Nodes represent official Facebook pages while the links are mutual likes between sites. This graph was collected through the Facebook Graph API in November 2017 and restricted to pages from 4 categories which are defined by Facebook. This web graph is a page-page graph of verified Facebook sites. Nodes represent official Facebook pages while the links are mutual likes between sites. This graph was collected through the Facebook Graph API in November 2017 and restricted to pages from 4 categories that are defined by Facebook.

7 PAPERS • NO BENCHMARKS YET

Synthetic Multimodal Dataset for Daily Life Activities

This dataset is originally created for the Knowledge Graph Reasoning Challenge for Social Issues (KGRC4SI) Video data that simulates daily life actions in a virtual space from Scenario Data. Knowledge graphs, and transcriptions of the Video Data content ("who" did what "action" with what "object," when and where, and the resulting "state" or "position" of the object). Knowledge Graph Embedding Data are created for reasoning based on machine learning

1 PAPER • NO BENCHMARKS YET

IMCPT-SparseGM-50

IMCPT-SparseGM dataset is a new visual graph matching benchmark addressing partial matching and graphs with larger sizes, based on the novel stereo benchmark Image Matching Challenge PhotoTourism (IMC-PT This dataset is released in CVPR 2023 paper Deep Learning of Partial Graph Matching via Differentiable Top-K.

1 PAPER • 1 BENCHMARK

IMCPT-SparseGM-100

1 PAPER • 1 BENCHMARK

HPO (Human Phenotype Ontology)

The Human Phenotype Ontology (HPO) graph is a standardized vocabulary of human phenotypic abnormalities and their relationships. It represents these abnormalities as nodes in a graph, with edges indicating relationships such as subtypes or overlapping features. The HPO graph is organized in a hierarchical structure, with more general terms at the top and more specific terms at the bottom.

1 PAPER • NO BENCHMARKS YET

Worldtree

Worldtree is a corpus of explanation graphs, explanatory role ratings, and associated tablestore. It contains explanation graphs for 1,680 questions, and 4,950 tablestore rows across 62 semi-structured tables are provided. This data is intended to be paired with the AI2 Mercury Licensed questions.

33 PAPERS • NO BENCHMARKS YET

MIMIC-SPARQL

…EHR data are typically stored in a relational database, which can also be converted to a directed acyclic graph, allowing two approaches for EHR QA: Table-based QA and Knowledge Graph-based QA. MIMIC-SPARQL dataset provides graph-based EHR QA data where natural language queries are converted to SPARQL instead of SQL

2 PAPERS • NO BENCHMARKS YET

NCI109

Tudataset: A collection of benchmark datasets for learning with graphs

69 PAPERS • 1 BENCHMARK

Taskography (PDDLGym Taskography)

PDDL dataset of Rearrangement tasks in large-scale 3D scene graphs.

1 PAPER • NO BENCHMARKS YET

GO21

GO21 is a biomedical knowledge graph that models genes, proteins, drugs, and the hierarchy of the biological processes they participate in. GO21 can be used for knowledge graph completion tasks (link prediction) as well as hierarchical reasoning tasks, such as ancestor-descendant prediction task proposed in the paper.

1 PAPER • 1 BENCHMARK

HatefulDiscussions

Multi-Modal Hate Speech Detection with Graph Context. 18k+ labels, 8k+ discussions, 900k+ comments.

1 PAPER • NO BENCHMARKS YET

tida-gcn-data

The datasets of "Time Interval-enhanced Graph Neural Network for Shared-account Cross-domain Sequential Recommendation" (TNNLs 2022)

1 PAPER • NO BENCHMARKS YET

LEA-GCN-dataset

The datasets of "Towards Lightweight Cross-domain Sequential Recommendation via External Attention-enhanced Graph Convolution Network" (DASFAA 2023)

1 PAPER • NO BENCHMARKS YET

UPFD (User Preference-aware Fake News Detection)

…The dataset has been integrated with Pytorch Geometric (PyG) and Deep Graph Library (DGL). You can load the dataset after installing the latest versions of PyG or DGL. The UPFD dataset includes two sets of tree-structured graphs curated for evaluating binary graph classification, graph anomaly detection, and fake/real news detection tasks. The news retweet graphs were originally extracted by FakeNewsNet. Each graph is a hierarchical tree-structured graph where the root node represents the news; the leaf nodes are Twitter users who retweeted the root news. The dataset statistics is shown below: | Data | #Graphs | #Fake News| #Total Nodes | #Total Edges | #Avg.

7 PAPERS • 2 BENCHMARKS

VerbCL

VerbCL is a dataset that consists of the citation graph of court opinions, which cite previously published court opinions in support of their arguments. VerbCL is derived from CourtListener and introduces the task of highlight extraction as a single-document summarization task based on the citation graph.

1 PAPER • NO BENCHMARKS YET

PolyDensity (Polymer Density)

…It uses monomers as polymer graphs to predict the property of polymer density.

1 PAPER • NO BENCHMARKS YET

Hateful Users on Twitter

This is a Twitter dataset of 100,386 users along with up to 200 tweets from their timelines with a random-walk-based crawler on the retweet graph, with a subsample of 4,972 which is manually annotated The dataset can be used to examine the difference between user activity patterns, the content disseminated between hateful and normal users, and network centrality measurements in the sampled graph.

2 PAPERS • NO BENCHMARKS YET

MeltingTemp (Melting Temperature)

…It uses monomers as polymer graphs to predict the property of polymer melting temperature.

2 PAPERS • NO BENCHMARKS YET

COMETA

Consists of 20k English biomedical entity mentions from Reddit expert-annotated with links to SNOMED CT, a widely-used medical knowledge graph.

18 PAPERS • NO BENCHMARKS YET

CTFW

…It is used to generate flow graphs from procedural texts.

1 PAPER • NO BENCHMARKS YET

SemEval-2021 Task-11

NLPContributionGraph was introduced as Task 11 at SemEval 2021 for the first time. The task is defined on a dataset of Natural Language Processing (NLP) scholarly articles with their contributions structured to be integrable within Knowledge Graph infrastructures such as the Open Research Knowledge Graph.

8 PAPERS • NO BENCHMARKS YET

Figment

A dataset for fine-grained entity typing of knowledge graph entities built from Freebase. It can be used to evaluate entity representations and also mention-level entity typing.

8 PAPERS • NO BENCHMARKS YET

MIB Dataset

…It contains fake and real accounts of Twitter and their follower's/friends' ids (can create a graph based on that).

2 PAPERS • 1 BENCHMARK

Amazon-Fraud (Multi-relational Graph Dataset for Amazon Fraudulent Account Detection)

Amazon-Fraud is a multi-relational graph dataset built upon the Amazon review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models. Class=1)| |-------|--------| | 11,944 | 9.5 | | Relation | # Edges | |--------|--------| | U-P-U | 175,608 | | U-S-U | 3,566,479 | | U-V-U | 1,036,737 | | All | 4,398,392 | Graph We take users as nodes in the graph and design three relations: 1) U-P-U: it connects users reviewing at least one same product; 2) U-S-V: it connects users having at least one same star rating within

6 PAPERS • 2 BENCHMARKS

Yelp-Fraud (Multi-relational Graph Dataset for Yelp Spam Review Detection)

Yelp-Fraud is a multi-relational graph dataset built upon the Yelp spam review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models. Class=1) | |-------|--------| | 45,954 | 14.5 | | Relation | # Edges | |--------|--------| | R-U-R | 49,315 | | R-T-R | 573,616 | | R-S-R | 3,402,743 | | All | 3,846,979 | Graph Based on previous studies which show that opinion fraudsters have connections in user, product, review text, and time, we take reviews as nodes in the graph and design three relations: 1) R-U-R: it connects

10 PAPERS • 2 BENCHMARKS

Inception

Inception Computational Graph (CG) family introduced in "GENNAPE: Towards Generalized Neural Architecture Performance Estimators", accepted to AAAI-23.

2 PAPERS • NO BENCHMARKS YET

The Reddit dataset is a graph dataset from Reddit posts made in the month of September, 2014. The node label in this case is the community, or “subreddit”, that a post belongs to. 50 large communities have been sampled to build a post-to-post graph, connecting posts if the same user comments on

594 PAPERS • 13 BENCHMARKS

MindReader

MindReader is a novel dataset providing explicit user ratings over a knowledge graph within the movie domain. The latest stable version of the dataset contains 218,794 ratings from 2,316 users over 12,206 entities entities, and an associated knowledge graph consisting of 18,133 movie-related entities.

1 PAPER • NO BENCHMARKS YET

Two-Path

Two-Path Computational Graph (CG) family introduced in "GENNAPE: Towards Generalized Neural Architecture Performance Estimators", accepted to AAAI-23.

2 PAPERS • NO BENCHMARKS YET

SSN (Semantic Scholar Network)

…The entire dataset constitutes a large connected citation graph.

5 PAPERS • NO BENCHMARKS YET

HiAML

HiAML Computational Graph (CG) family introduced in "GENNAPE: Towards Generalized Neural Architecture Performance Estimators", accepted to AAAI-23.

2 PAPERS • NO BENCHMARKS YET

Nations

The Nations dataset is a small knowledge graph with 14 entities, 55 relations, and 1992 triples describing countries and their political relationships.

2 PAPERS • NO BENCHMARKS YET

SAGC-A68

SAGC-A68 (A space access graph dataset for the classification of spaces and space elements in apartment buildings)

…Although existing space function classifiers use space adjacency or connectivity graphs as input, the application of Graph Deep Learning (GDL) to space layout element classification has not been extensively To bridge this gap, we introduce a dataset named SAGC-A68, which comprises access graphs automatically generated from 68 digital 3D models of space layouts of apartment buildings designed or built between Each access graph contains nodes representing spaces and space elements and edges representing the connection between them.

1 PAPER • NO BENCHMARKS YET

GEval for KGRC-RDF-star

This repository contains a (software) evaluation framework to perform evaluation and comparison on RDF-star graph embedding techniques.

1 PAPER • NO BENCHMARKS YET

GlassTemp (Glass Transition Temperature)

…It uses monomers as polymer graphs to predict the property of glass transition temperature.

2 PAPERS • 1 BENCHMARK

$O_2$Perm (Oxygen Permeability)

…It uses monomers as polymer graphs to predict the property of oxygen permeability. It has he limited size (595 polymers), which brings great challenges to the property prediction.

3 PAPERS • NO BENCHMARKS YET

AISECKG

AISECKG (AISecKG: Knowledge Graph Dataset for Cybersecurity Education)

…Knowledge graphs (KG) provide a visual representation in a graph that can reason and interpret from the underlying data, making them suitable for use in education and interactive learning. Creating knowledge graphs from unstructured text is challenging without an ontology or annotated dataset. However, data annotation for cybersecurity needs domain experts. This dataset can be used to construct knowledge graphs to teach cybersecurity and promote cognitive learning.

1 PAPER • NO BENCHMARKS YET

Logic2Text

…The logical forms show diversified graph structure of free schema, which poses great challenges on the model's ability to understand the semantics.

8 PAPERS • NO BENCHMARKS YET

DBP-5L (English)

…The dataset is used for the Knowledge Graph Completion and Entity Alignment task. DPB-5L (English) is a subset of DPB-5L with English KG.

3 PAPERS • 1 BENCHMARK

Amazon Product Data

This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs).

33 PAPERS • 6 BENCHMARKS