🔔 Share your dataset with the ML community!

Filter by Modality (clear)

Filter by Task

Filter by Language

239 dataset results for Graphs

UPFD-GOS (User Preference-aware Fake News Detection)

The Gossipcop variant of the UPFD dataset for benchmarking.

3 PAPERS • 1 BENCHMARK

WikiGraphs

WikiGraphs is a dataset of Wikipedia articles each paired with a knowledge graph, to facilitate the research in conditional text generation, graph generation and graph representation learning. Existing graph-text paired datasets typically contain small graphs and short text (1 or few sentences), thus limiting the capabilities of the models that can be learned on the data.

3 PAPERS • 1 BENCHMARK

Biographical

Biographical (Biographical: A Semi-Supervised Relation Extraction Dataset)

Biographical is a semi-supervised dataset for RE. The dataset, which is aimed towards digital humanities (DH) and historical research, is automatically compiled by aligning sentences from Wikipedia articles with matching structured data from sources including Pantheon and Wikidata.

2 PAPERS • NO BENCHMARKS YET

DeepNets-1M

The DeepNets-1M dataset is composed of neural network architectures represented as graphs where nodes are operations (convolution, pooling, etc.) and edges correspond to the forward pass flow of data through the network. DeepNets-1M has 1 million training architectures and 1402 in-distribution (ID) and out-of-distribution (OOD) evaluation architectures: 500 validation and 500 testing ID architectures, 100 wide OOD architectures, 100 deep OOD architectures, 100 dense OOD architectures, 100 OOD archtectures without batch normalization, and 2 predefined architectures (ResNet-50 and 12 layer Visual Transformer).

2 PAPERS • NO BENCHMARKS YET

GDSC (Genomics of Drug Sensitivity in Cancer)

We have characterized 1000 human cancer cell lines and screened them with 100s of compounds. On this website, you will find drug response data and genomic markers of sensitivity.

2 PAPERS • 1 BENCHMARK

GlassTemp (Glass Transition Temperature)

The GlassTemp dataset is collected from Polyinfo. It uses monomers as polymer graphs to predict the property of glass transition temperature. The glass transition temperature of the material itself denotes the temperature range over which this glass transition takes place.

2 PAPERS • 1 BENCHMARK

Hateful Users on Twitter

This is a Twitter dataset of 100,386 users along with up to 200 tweets from their timelines with a random-walk-based crawler on the retweet graph, with a subsample of 4,972 which is manually annotated as hateful or not through crowdsourcing. The dataset can be used to examine the difference between user activity patterns, the content disseminated between hateful and normal users, and network centrality measurements in the sampled graph.

2 PAPERS • NO BENCHMARKS YET

HiAML

HiAML Computational Graph (CG) family introduced in "GENNAPE: Towards Generalized Neural Architecture Performance Estimators", accepted to AAAI-23. Contains 4.6k CIFAR-10 networks with an accuracy range of [91.11%, 93.44%].

2 PAPERS • NO BENCHMARKS YET

Inception

Inception Computational Graph (CG) family introduced in "GENNAPE: Towards Generalized Neural Architecture Performance Estimators", accepted to AAAI-23. Contains 580 CIFAR-10 networks with an accuracy range of [89.08%, 94.03%].

2 PAPERS • NO BENCHMARKS YET

Large-scale Ridesharing DARP Instances Based on Real Travel Demand

This dataset presents a set of large-scale ridesharing Dial-a-Ride Problem (DARP) instances. The instances were created as a standardized set of ridesharing DARP problems for the purpose of benchmarking and comparing different solution methods.

2 PAPERS • NO BENCHMARKS YET

MIMIC-SPARQL

Question Answering (QA) is a widely-used framework for developing and evaluating an intelligent machine. In this light, QA on Electronic Health Records (EHR), namely EHR QA, can work as a crucial milestone toward developing an intelligent agent in healthcare. EHR data are typically stored in a relational database, which can also be converted to a directed acyclic graph, allowing two approaches for EHR QA: Table-based QA and Knowledge Graph-based QA.

2 PAPERS • NO BENCHMARKS YET

MarKG

MarKG (Multimodal analogical reasoning Knowledge Graph)

The MarKG dataset has 11,292 entities, 192 relations and 76,424 images, including 2,063 analogy entities and 27 analogy relations. The original intention of MarKG is to provide prior knowledge of analogy entities and relations for better multimodal analogical reasoning.

2 PAPERS • NO BENCHMARKS YET

MetaVD (Meta Video Dataset)

MetaVD is a Meta Video Dataset for enhancing human action recognition datasets. It provides human-annotated relationship labels between action classes across human action recognition datasets. MetaVD is proposed in the following paper: Yuya Yoshikawa, Yutaro Shigeto, and Akikazu Takeuchi. "MetaVD: A Meta Video Dataset for enhancing human action recognition datasets." Computer Vision and Image Understanding 212 (2021): 103276. [link]

2 PAPERS • NO BENCHMARKS YET

Nations

The Nations dataset is a small knowledge graph with 14 entities, 55 relations, and 1992 triples describing countries and their political relationships. This dataset is available for download from https://github.com/ZhenfengLei/KGDatasets.

2 PAPERS • NO BENCHMARKS YET

Ocean Drifters (Madagascar Ocean Drifters)

From Schaub, Michael T., et al. "Random walks on simplicial complexes and the normalized hodge 1-laplacian." SIAM Review 62.2 (2020): 353-391.

2 PAPERS • NO BENCHMARKS YET

PACE 2016 Feedback Vertex Set

PACE 2016 Feedback Vertex Set (PACE 2016 Track B, Feedback Vertex Set)

This is the dataset used in the PACE 2016 challenge, Track B, which was computing minimal Feedback Vertex Set. This competition focused on exact solutions, i.e. provably minimal feedback vertex sets (and no heuristic solutions). This should not be confused with the PACE 2022 challenge, which focused on directed feedback vertex set, and has its own entries on PapersWithCode (exact and heuristic).

2 PAPERS • NO BENCHMARKS YET

PACE 2022 Exact

PACE 2022 Exact (PACE 2022 Directed Feedback Vertex Set, Exact Track)

This is the set of graphs used in the PACE 2022 challenge for computing the Directed Feedback Vertex Set, from the Exact track. It consists of 200 labelled directed graphs. The graphs range in size up to from N=512 up to N=131072 vertices, and up to 1315170 edges. The graphs are mostly not symmetric (an edge form u->v does not imply an edge from v->u), although some are symmetric. The graph labels are integers ranging from 1 to N.

2 PAPERS • NO BENCHMARKS YET

Placenta

Placenta is a benchmark dataset for node classification in an underexplored domain: predicting microanatomical tissue structures from cell graphs in placenta histology whole slide images. Cell graphs are large (>1 million nodes per image), node features are varied (64-dimensions of 11 types of cells), class labels are imbalanced (9 classes ranging from 0.21% of the data to 40.0%), and cellular communities cluster into heterogeneously distributed tissues of widely varying sizes (from 11 nodes to 44,671 nodes for a single structure).

2 PAPERS • 1 BENCHMARK

PointPattern

PointPattern is a graph classification dataset constructed by simple point patterns from statistical mechanics. The authors simulated three point patterns in 2D: hard disks in equilibrium (HD), Poisson point process, and random sequential adsorption (RSA) of disks. The HD and Poisson distributions can be seen as simple models that describe the microstructures of liquids and gases while the RSA is a nonequilibrium stochastic process that introduces new particles one by one subject to nonoverlapping conditions.

2 PAPERS • NO BENCHMARKS YET

Rent3D++

Rent3D++ is an extension of the Rent3D floorplans + photos dataset. The floorplans are annotated with room outline polygons, doors/windows as line segments, object-icons as axis-aligned bounding boxes, room-door-room connectivity graphs, and photo-room assignments. We have extracted rectified surface crops from architectural surfaces in photos, and these can drive interior texturing/material modeling tasks. This dataset can be used with our paper Plan2Scene to generate textured 3D mesh models of houses using floorplans and photos.

2 PAPERS • 1 BENCHMARK

SLNET

SLNET (SLNET: A Redistributable Corpus of 3rd-party Simulink Models)

SLNET is collection of third party Simulink models. It is curated via mining open source repository (GitHub and Matlab Central) using SLNET-Miner (https://github.com/50417/SLNet_Miner).

2 PAPERS • NO BENCHMARKS YET

TRN (Toulouse Road Network)

The Toulouse Road Network dataset describes patches of road maps from the city of Toulouse, represented both as spatial graphs G = (A, X) and as grayscale segmentation images.

2 PAPERS • 1 BENCHMARK

TTE-A&O (Travel Time Estimation: Abakan and Omsk)

The dataset includes two parts corresponding to the cities of Abakan (65524 nodes, 340012 edges) and Omsk (231688 nodes, 1149492 edges). Along with the road network graph, it includes trip records represented as sequences of visited nodes (making the dataset suitable both for path-blind and path-aware settings). There are two types of target values for a regression task: real travel time and real length of a trip.

2 PAPERS • 1 BENCHMARK

Two-Path

Two-Path Computational Graph (CG) family introduced in "GENNAPE: Towards Generalized Neural Architecture Performance Estimators", accepted to AAAI-23. Contains 6.9k CIFAR-10 networks with an accuracy range of [85.53%, 92.34%].

2 PAPERS • NO BENCHMARKS YET

UPFD-POL (User Preference-aware Fake News Detection)

The PolitiFact variant of the UPFD dataset for benchmarking.

2 PAPERS • 1 BENCHMARK

VirtualHome2KG

VirtualHome2KG is a system for constructing and augmenting knowledge graphs (KGs) of daily living activities using virtual space. We also provide an ontology to describe the structure of the KGs. We used VirtualHome as a platform of virtual space simulation. Thus, this repository is an extension of the virtualhome. Please see the original repository of the virtualhome for details of the Unity simulation.

2 PAPERS • NO BENCHMARKS YET

WyzeRule

Wyze Rule Recommendation Dataset. It is a big dataset with 300,000 users. Please cite [1] if you used the dataset and cite [2] if you referenced the algorithm.

2 PAPERS • NO BENCHMARKS YET

maze-dataset

This package provides utilities for generation, filtering, solving, visualizing, and processing of mazes for training ML systems. Primarily built for the maze-transformer interpretability project. You can find our paper on it here: http://arxiv.org/abs/2309.10498

2 PAPERS • NO BENCHMARKS YET

2D_NACA_RANS

Dataset of low fidelity resolutions of the RANS equations over airfoils.

1 PAPER • NO BENCHMARKS YET

AutoFR Dataset

AutoFR Dataset is broken down by each site that we crawl within a zip file. It contains multiple different experiments that we conducted in our paper. The overall dataset contains 1042 sites that we crawled where we detected ads within the Top-5K.

1 PAPER • NO BENCHMARKS YET

BeGin

BeGin provides 23 benchmark scenarios for graph from 14 real-world datasets, which cover 12 combinations of the incremental settings and the levels of problem. In addition, BeGin provides various basic evaluation metrics for measuring the performances and final evalution metrics designed for continual learning.

1 PAPER • NO BENCHMARKS YET

Building air quality and pandemic risk simulation

The original paper contains a high-level explanation of the dataset characteristics, and potential use cases of the dataset. ArchABM can help to quantify the impact of some of these building- and company policy-related measures.

1 PAPER • NO BENCHMARKS YET

CHILI-100K

The CHILI-100K dataset is a large-scale graph dataset (with overall >183M nodes, >1.2B edges) of nanomaterials generated from experimentally determined crystal structures. The crystal structures used in CHILI-100K are obtained from a curated subset from the Crystallography Open Database (COD) and has a broad chemical scope covering database entries for 68 metals and 11 non-metals.

1 PAPER • 8 BENCHMARKS

CHILI-3K

The CHILI-3K dataset is a medium-scale graph dataset (with overall >6M nodes, >49M edges) of mono-metallic oxide nanomaterials generated from 12 selected crystal types. This dataset has a narrow chemical scope focused on an interesting part of chemical space with a lot of active research.

1 PAPER • 8 BENCHMARKS

CIRO experimental results

Description This repository includes the experiment results, source code, and test data for Three Cs risk inference, using the CIRO (COVID-19 Infection Risk Ontology) and HermiT.

1 PAPER • NO BENCHMARKS YET

CTFW

CTFW is a large annotated procedural text dataset in the cybersecurity domain (3154 documents). It is used to generate flow graphs from procedural texts.

1 PAPER • NO BENCHMARKS YET

CellTypeGraph Benchmark

Classifying all cells in an organ is a relevant and difficult problem from plant developmental biology. We here abstract the problem into a new benchmark for node classification in a geo-referenced graph. Solving it requires learning the spatial layout of the organ including symmetries. To allow the convenient testing of new geometrical learning methods, the benchmark of Arabidopsis thaliana ovules is made available as a PyTorch data loader, along with a large number of precomputed features.

1 PAPER • 1 BENCHMARK

ChEMBL

ChEMBL is a manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs.

1 PAPER • NO BENCHMARKS YET

Classic ECN AQM Fall-Back

Clickable heat-map visualizations of the experiments run to quantify the Classic ECN AQM problem and to evaluate the success of the Classic AQM Detection and Fall-back algorithm.

1 PAPER • NO BENCHMARKS YET

Color-connectivity

Synthetic graph classification datasets with the task of recognizing the connectivity of same-colored nodes in 4 graphs of varying topology.

1 PAPER • NO BENCHMARKS YET

DBP-5L (Spanish)

DPB-5L is a Multilingual KG dataset containing 5 KGs in English, French, Japanese, Greek, and Spanish. The dataset is used for the Knowledge Graph Completion and Entity Alignment task. DPB-5L (Spanish) is a subset of DPB-5L with Spanish KG.

1 PAPER • NO BENCHMARKS YET

DEAP City Dataset

Main Dataset city_pollution_data.csv

1 PAPER • NO BENCHMARKS YET

DPPIN

DPPIN is a collection of dynamic networks, which consists of twelve generated dynamic protein-protein interaction networks of yeast cells, stored in twelve folders.

1 PAPER • NO BENCHMARKS YET

FB1.5M

The FB1.5M dataset is a benchmark for Knowledge Graph Completion. It is based on Freebase and it contains 30 relations with less than 500 triplets as low-resource relations.

1 PAPER • NO BENCHMARKS YET

FireRisk (FireRisk: A Remote Sensing Dataset for Fire Risk Assessment)

In this work, we propose a novel remote sensing dataset, FireRisk, consisting of 7 fire risk classes with a total of 91 872 labelled images for fire risk assessment. This remote sensing dataset is labelled with the fire risk classes supplied by the Wildfire Hazard Potential (WHP) raster dataset, and remote sensing images are collected using the National Agriculture Imagery Program (NAIP), a high-resolution remote sensing imagery program. On FireRisk, we present benchmark performance for supervised and self-supervised representations, with Masked Autoencoders (MAE) pre-trained on ImageNet1k achieving the highest classification accuracy, 65.29%.

1 PAPER • 1 BENCHMARK

GEval for KGRC-RDF-star

This repository is an extension of GEval. This repository contains a (software) evaluation framework to perform evaluation and comparison on RDF-star graph embedding techniques. The gold standard datasets for evaluation were created from KGRC-RDF-star. Please see here.

1 PAPER • NO BENCHMARKS YET

GO21

GO21 is a biomedical knowledge graph that models genes, proteins, drugs, and the hierarchy of the biological processes they participate in. It consists of 806,136 triples with 21 relations and 89127 entities. GO21 can be used for knowledge graph completion tasks (link prediction) as well as hierarchical reasoning tasks, such as ancestor-descendant prediction task proposed in the paper.

1 PAPER • 1 BENCHMARK

Genre2Movies

Genre2Movies (Compositional queries for Movie recommendation)

Genre annotations for movies The file genre2movies.csv contains genre-movie tuples based on Wikidata annotations (https://www.wikidata.org/).

1 PAPER • NO BENCHMARKS YET