🔔 Share your dataset with the ML community!

Filter by Modality (clear)

Filter by Task

Filter by Language (clear)

49 dataset results for Graphs AND Graphs AND English

WikiGraphs is a dataset of Wikipedia articles each paired with a knowledge graph, to facilitate the research in conditional text generation, graph generation and graph representation learning. Existing graph-text paired datasets typically contain small graphs and short text (1 or few sentences), thus limiting the capabilities of the models that can be learned on the data. WikiGraphs is collected by pairing each Wikipedia article from the established WikiText-103 benchmark with a subgraph from the Freebase knowledge graph. Both the graphs and the text data are of significantly larger scale compared to prior graph-text paired datasets.

3 PAPERS • 1 BENCHMARK

HeriGraph (Multimodal Machine Learning Datasets on Graphs of Heritage Values and Attributes)

The dataset contains constructed multi-modal features (visual and textual), pseudo-labels (on heritage values and attributes), and graph structures (with temporal, social, and spatial links) constructed

1 PAPER • NO BENCHMARKS YET

ILPC22-Large

…Training graph contains 46K entities, 130 relations, 202K triples. Inference graph contains 30K entities, 130 relations, 77K triples. Validation and test triples to predict belong to the inference graph.

1 PAPER • 1 BENCHMARK

ILPC22-Small

…Training graph contains 10K entities, 96 relations, 78K triples. Inference graph contains 7K entities, 96 relations, 21K triples. Validation and test triples to predict belong to the inference graph.

1 PAPER • 1 BENCHMARK

Facebook Page-Page

This webgraph is a page-page graph of verified Facebook sites. Nodes represent official Facebook pages while the links are mutual likes between sites. This graph was collected through the Facebook Graph API in November 2017 and restricted to pages from 4 categories which are defined by Facebook. This web graph is a page-page graph of verified Facebook sites. Nodes represent official Facebook pages while the links are mutual likes between sites. This graph was collected through the Facebook Graph API in November 2017 and restricted to pages from 4 categories that are defined by Facebook.

7 PAPERS • NO BENCHMARKS YET

IMCPT-SparseGM-50

IMCPT-SparseGM dataset is a new visual graph matching benchmark addressing partial matching and graphs with larger sizes, based on the novel stereo benchmark Image Matching Challenge PhotoTourism (IMC-PT This dataset is released in CVPR 2023 paper Deep Learning of Partial Graph Matching via Differentiable Top-K.

1 PAPER • 1 BENCHMARK

IMCPT-SparseGM-100

1 PAPER • 1 BENCHMARK

Worldtree

Worldtree is a corpus of explanation graphs, explanatory role ratings, and associated tablestore. It contains explanation graphs for 1,680 questions, and 4,950 tablestore rows across 62 semi-structured tables are provided. This data is intended to be paired with the AI2 Mercury Licensed questions.

33 PAPERS • NO BENCHMARKS YET

MIMIC-SPARQL

…EHR data are typically stored in a relational database, which can also be converted to a directed acyclic graph, allowing two approaches for EHR QA: Table-based QA and Knowledge Graph-based QA. MIMIC-SPARQL dataset provides graph-based EHR QA data where natural language queries are converted to SPARQL instead of SQL

2 PAPERS • NO BENCHMARKS YET

NCI109

Tudataset: A collection of benchmark datasets for learning with graphs

70 PAPERS • 1 BENCHMARK

GO21

GO21 is a biomedical knowledge graph that models genes, proteins, drugs, and the hierarchy of the biological processes they participate in. GO21 can be used for knowledge graph completion tasks (link prediction) as well as hierarchical reasoning tasks, such as ancestor-descendant prediction task proposed in the paper.

1 PAPER • 1 BENCHMARK

HatefulDiscussions

Multi-Modal Hate Speech Detection with Graph Context. 18k+ labels, 8k+ discussions, 900k+ comments.

1 PAPER • NO BENCHMARKS YET

UPFD (User Preference-aware Fake News Detection)

…The dataset has been integrated with Pytorch Geometric (PyG) and Deep Graph Library (DGL). You can load the dataset after installing the latest versions of PyG or DGL. The UPFD dataset includes two sets of tree-structured graphs curated for evaluating binary graph classification, graph anomaly detection, and fake/real news detection tasks. The news retweet graphs were originally extracted by FakeNewsNet. Each graph is a hierarchical tree-structured graph where the root node represents the news; the leaf nodes are Twitter users who retweeted the root news. The dataset statistics is shown below: | Data | #Graphs | #Fake News| #Total Nodes | #Total Edges | #Avg.

7 PAPERS • 2 BENCHMARKS

Hateful Users on Twitter

This is a Twitter dataset of 100,386 users along with up to 200 tweets from their timelines with a random-walk-based crawler on the retweet graph, with a subsample of 4,972 which is manually annotated The dataset can be used to examine the difference between user activity patterns, the content disseminated between hateful and normal users, and network centrality measurements in the sampled graph.

2 PAPERS • NO BENCHMARKS YET

CTFW

…It is used to generate flow graphs from procedural texts.

1 PAPER • NO BENCHMARKS YET

Amazon-Fraud (Multi-relational Graph Dataset for Amazon Fraudulent Account Detection)

Amazon-Fraud is a multi-relational graph dataset built upon the Amazon review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models. Class=1)| |-------|--------| | 11,944 | 9.5 | | Relation | # Edges | |--------|--------| | U-P-U | 175,608 | | U-S-U | 3,566,479 | | U-V-U | 1,036,737 | | All | 4,398,392 | Graph We take users as nodes in the graph and design three relations: 1) U-P-U: it connects users reviewing at least one same product; 2) U-S-V: it connects users having at least one same star rating within

6 PAPERS • 2 BENCHMARKS

Yelp-Fraud (Multi-relational Graph Dataset for Yelp Spam Review Detection)

Yelp-Fraud is a multi-relational graph dataset built upon the Yelp spam review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models. Class=1) | |-------|--------| | 45,954 | 14.5 | | Relation | # Edges | |--------|--------| | R-U-R | 49,315 | | R-T-R | 573,616 | | R-S-R | 3,402,743 | | All | 3,846,979 | Graph Based on previous studies which show that opinion fraudsters have connections in user, product, review text, and time, we take reviews as nodes in the graph and design three relations: 1) R-U-R: it connects

10 PAPERS • 2 BENCHMARKS

Inception

Inception Computational Graph (CG) family introduced in "GENNAPE: Towards Generalized Neural Architecture Performance Estimators", accepted to AAAI-23.

2 PAPERS • NO BENCHMARKS YET

The Reddit dataset is a graph dataset from Reddit posts made in the month of September, 2014. The node label in this case is the community, or “subreddit”, that a post belongs to. 50 large communities have been sampled to build a post-to-post graph, connecting posts if the same user comments on

595 PAPERS • 13 BENCHMARKS

SSN (Semantic Scholar Network)

…The entire dataset constitutes a large connected citation graph.

5 PAPERS • NO BENCHMARKS YET

Two-Path

Two-Path Computational Graph (CG) family introduced in "GENNAPE: Towards Generalized Neural Architecture Performance Estimators", accepted to AAAI-23.

2 PAPERS • NO BENCHMARKS YET

HiAML

HiAML Computational Graph (CG) family introduced in "GENNAPE: Towards Generalized Neural Architecture Performance Estimators", accepted to AAAI-23.

2 PAPERS • NO BENCHMARKS YET

Nations

The Nations dataset is a small knowledge graph with 14 entities, 55 relations, and 1992 triples describing countries and their political relationships.

2 PAPERS • NO BENCHMARKS YET

GEval for KGRC-RDF-star

This repository contains a (software) evaluation framework to perform evaluation and comparison on RDF-star graph embedding techniques.

1 PAPER • NO BENCHMARKS YET

GlassTemp (Glass Transition Temperature)

…It uses monomers as polymer graphs to predict the property of glass transition temperature.

2 PAPERS • 1 BENCHMARK

DBP-5L (English)

…The dataset is used for the Knowledge Graph Completion and Entity Alignment task. DPB-5L (English) is a subset of DPB-5L with English KG.

3 PAPERS • 1 BENCHMARK

Amazon Product Data

This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs).

33 PAPERS • 6 BENCHMARKS

FrameNet

FrameNet is a linguistic knowledge graph containing information about lexical and predicate argument semantics of the English language.

435 PAPERS • NO BENCHMARKS YET

CHILI-3K

The CHILI-3K dataset is a medium-scale graph dataset (with overall >6M nodes, >49M edges) of mono-metallic oxide nanomaterials generated from 12 selected crystal types.

1 PAPER • 8 BENCHMARKS

KGRC-RDF-star

KGRC-RDF-star is an RDF-star dataset converted from KGRC-RDF, which is a Knowledge graph dataset of novel stories. KGRC-RDF-star is a complex RDF-star graph dataset that contains nested structures of statements and scenes, e.g., "Person A said "Person B saw "Person C was in D" " ."

1 PAPER • NO BENCHMARKS YET

Wiki-One

This dataset is a Wikipedia dump, split by relations to perform Few-Shot Knowledge Graph Completion.

15 PAPERS • NO BENCHMARKS YET

WyzeRule

…NeurIPS Dataset Track 2023 [2] FedRule: Federated Rule Recommendation System with Graph Neural Networks. IoTDI 2023

2 PAPERS • NO BENCHMARKS YET

Netzschleuder (network catalogue, repository and centrifuge)

…This website is meant to be browsed both by humans and machines alike, and can also be accessed via a convenient JSON API, or via the graph-tool library. The network datasets themselves are available in several machine-readable formats, in particular gt, GraphML, GML and CSV.

3 PAPERS • NO BENCHMARKS YET

VirtualHome2KG

VirtualHome2KG is a system for constructing and augmenting knowledge graphs (KGs) of daily living activities using virtual space. We also provide an ontology to describe the structure of the KGs.

2 PAPERS • NO BENCHMARKS YET

CHILI-100K

The CHILI-100K dataset is a large-scale graph dataset (with overall >183M nodes, >1.2B edges) of nanomaterials generated from experimentally determined crystal structures.

1 PAPER • 8 BENCHMARKS

CellTypeGraph Benchmark

…We here abstract the problem into a new benchmark for node classification in a geo-referenced graph. Solving it requires learning the spatial layout of the organ including symmetries.

1 PAPER • 1 BENCHMARK

InferWiki

InferWiki is a Knowledge Graph Completion (KGC) dataset that improves upon existing benchmarks in inferential ability, assumptions, and patterns.

4 PAPERS • NO BENCHMARKS YET

ChEBI-20

…Given a text query and list of molecules without any reference textual information (represented, for example, as SMILES strings, graphs, or other equivalent representations) retrieve the molecule corresponding This requires the integration of two very different types of information: the structured knowledge represented by text and the chemical properties present in molecular graphs.

22 PAPERS • 4 BENCHMARKS

TextWorld KG

TextWorld KG is a dynamic Knowledge Graph (KG) extraction dataset. It is based on a set of text-based games generated using.

1 PAPER • NO BENCHMARKS YET

Rent3D++

…The floorplans are annotated with room outline polygons, doors/windows as line segments, object-icons as axis-aligned bounding boxes, room-door-room connectivity graphs, and photo-room assignments. Generated room-door-room connectivity graphs for floorplans. Annotated all windows, doors, and other wall openings, and associated them with corresponding rooms.

2 PAPERS • 1 BENCHMARK

l2d (Learning to Dance)

…It contains multimodal data (visual data, temporal-graphs and audio) careful-selected from publicly available videos of dancers performing representative movements of the music style and audio data from

1 PAPER • NO BENCHMARKS YET

LDC2017T10

LDC2017T10 (Abstract Meaning Representation (AMR) Annotation Release 2.0)

…Each sentence is paired with a graph that represents its whole-sentence meaning in a tree-structure.

27 PAPERS • 2 BENCHMARKS

LDC2020T02

LDC2020T02 (Abstract Meaning Representation (AMR) Annotation Release 3.0)

…Each sentence is paired with a graph that represents its whole-sentence meaning in a tree-structure.

9 PAPERS • 2 BENCHMARKS

MuMiN

MuMiN is a misinformation graph dataset containing rich social media data (tweets, replies, users, images, articles, hashtags), spanning 21 million tweets belonging to 26 thousand Twitter threads, each

4 PAPERS • 3 BENCHMARKS

SupplyGraph (SupplyGraph: A Benchmark Dataset for Supply Chain Planning using Graph Neural Networks)

Graph Neural Networks (GNNs) have gained traction across different domains such as transportation, bio-informatics, language processing, and computer vision.

1 PAPER • NO BENCHMARKS YET

Ocean Drifters (Madagascar Ocean Drifters)

…They considered all triangles (3-cliques) in this graph to be faces of the simplcial complex. The Laplacian Ls 1 of the resulting complex has a two-dimensional harmonic space.

2 PAPERS • NO BENCHMARKS YET

UMLS

UMLS (Unified Medical Language System)

…- National Library of Medicine. https://www.nlm.nih.gov/research/umls/index.html. (2) UMLS Metathesaurus Browser. https://uts.nlm.nih.gov/uts/umls/home. (3) GitHub - dongwookim-ml/kg-data: knowledge-graph

18 PAPERS • 1 BENCHMARK

ReviewRobot Dataset

…KGs The KGs folder contains the knowledge graphs built on the IE_result. back_kg The back_kg contains the background KGs built up to a certain year. For each year, there are three files. Take 2012 as an example: * 2012.pkl contains the background knowledge graph up to (include) 2012.

1 PAPER • NO BENCHMARKS YET