🔔 Share your dataset with the ML community!

Filter by Modality (clear)

Filter by Task

Filter by Language (clear)

56 dataset results for Graphs AND Texts AND English

WikiGraphs is a dataset of Wikipedia articles each paired with a knowledge graph, to facilitate the research in conditional text generation, graph generation and graph representation learning. Existing graph-text paired datasets typically contain small graphs and short text (1 or few sentences), thus limiting the capabilities of the models that can be learned on the data. WikiGraphs is collected by pairing each Wikipedia article from the established WikiText-103 benchmark with a subgraph from the Freebase knowledge graph. Both the graphs and the text data are of significantly larger scale compared to prior graph-text paired datasets.

3 PAPERS • 1 BENCHMARK

HeriGraph (Multimodal Machine Learning Datasets on Graphs of Heritage Values and Attributes)

The dataset contains constructed multi-modal features (visual and textual), pseudo-labels (on heritage values and attributes), and graph structures (with temporal, social, and spatial links) constructed

1 PAPER • NO BENCHMARKS YET

SciGraphQA

SciGraphQA is a large-scale, open-domain dataset focused on generating multi-turn conversational question-answering dialogues centered around understanding and describing scientific graphs and figures. Each sample in ScFiGraphQA consists of a scientific graph image sourced from papers on ArXiv, accompanied by rich textual context including the paper's title, abstract, figure caption, and a paragraph The key motivation behind SciGraphQA is providing a large-scale resource to support research and development of multi-modal AI systems that can engage in informative, open-ended conversations about graphs Potential use cases of SciGraphQA include pre-training and benchmarking multi-modal conversational models for scientific graph comprehension, building AI assistants that can discuss data insights, and The academic source material also provides a way to evaluate model capabilities on expert-level graphs spanning diverse topics and complex visual encodings.

3 PAPERS • 1 BENCHMARK

Microsoft Academic Graph

The Microsoft Academic Graph is a heterogeneous graph containing scientific publication records, citation relationships between those publications, as well as authors, institutions, journals, conferences

116 PAPERS • 1 BENCHMARK

Worldtree

Worldtree is a corpus of explanation graphs, explanatory role ratings, and associated tablestore. It contains explanation graphs for 1,680 questions, and 4,950 tablestore rows across 62 semi-structured tables are provided. This data is intended to be paired with the AI2 Mercury Licensed questions.

33 PAPERS • NO BENCHMARKS YET

MIMIC-SPARQL

…EHR data are typically stored in a relational database, which can also be converted to a directed acyclic graph, allowing two approaches for EHR QA: Table-based QA and Knowledge Graph-based QA. MIMIC-SPARQL dataset provides graph-based EHR QA data where natural language queries are converted to SPARQL instead of SQL

2 PAPERS • NO BENCHMARKS YET

Taskography (PDDLGym Taskography)

PDDL dataset of Rearrangement tasks in large-scale 3D scene graphs.

1 PAPER • NO BENCHMARKS YET

HatefulDiscussions

Multi-Modal Hate Speech Detection with Graph Context. 18k+ labels, 8k+ discussions, 900k+ comments.

1 PAPER • NO BENCHMARKS YET

tida-gcn-data

The datasets of "Time Interval-enhanced Graph Neural Network for Shared-account Cross-domain Sequential Recommendation" (TNNLs 2022)

1 PAPER • NO BENCHMARKS YET

LEA-GCN-dataset

The datasets of "Towards Lightweight Cross-domain Sequential Recommendation via External Attention-enhanced Graph Convolution Network" (DASFAA 2023)

1 PAPER • NO BENCHMARKS YET

UPFD (User Preference-aware Fake News Detection)

…The dataset has been integrated with Pytorch Geometric (PyG) and Deep Graph Library (DGL). You can load the dataset after installing the latest versions of PyG or DGL. The UPFD dataset includes two sets of tree-structured graphs curated for evaluating binary graph classification, graph anomaly detection, and fake/real news detection tasks. The news retweet graphs were originally extracted by FakeNewsNet. Each graph is a hierarchical tree-structured graph where the root node represents the news; the leaf nodes are Twitter users who retweeted the root news. The dataset statistics is shown below: | Data | #Graphs | #Fake News| #Total Nodes | #Total Edges | #Avg.

7 PAPERS • 2 BENCHMARKS

VerbCL

VerbCL is a dataset that consists of the citation graph of court opinions, which cite previously published court opinions in support of their arguments. VerbCL is derived from CourtListener and introduces the task of highlight extraction as a single-document summarization task based on the citation graph.

1 PAPER • NO BENCHMARKS YET

Hateful Users on Twitter

This is a Twitter dataset of 100,386 users along with up to 200 tweets from their timelines with a random-walk-based crawler on the retweet graph, with a subsample of 4,972 which is manually annotated The dataset can be used to examine the difference between user activity patterns, the content disseminated between hateful and normal users, and network centrality measurements in the sampled graph.

2 PAPERS • NO BENCHMARKS YET

CTFW

…It is used to generate flow graphs from procedural texts.

1 PAPER • NO BENCHMARKS YET

SemEval-2021 Task-11

NLPContributionGraph was introduced as Task 11 at SemEval 2021 for the first time. The task is defined on a dataset of Natural Language Processing (NLP) scholarly articles with their contributions structured to be integrable within Knowledge Graph infrastructures such as the Open Research Knowledge Graph.

8 PAPERS • NO BENCHMARKS YET

SSN (Semantic Scholar Network)

…The entire dataset constitutes a large connected citation graph.

5 PAPERS • NO BENCHMARKS YET

AISECKG

AISECKG (AISecKG: Knowledge Graph Dataset for Cybersecurity Education)

…Knowledge graphs (KG) provide a visual representation in a graph that can reason and interpret from the underlying data, making them suitable for use in education and interactive learning. Creating knowledge graphs from unstructured text is challenging without an ontology or annotated dataset. However, data annotation for cybersecurity needs domain experts. This dataset can be used to construct knowledge graphs to teach cybersecurity and promote cognitive learning.

1 PAPER • NO BENCHMARKS YET

Amazon Product Data

This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs).

33 PAPERS • 6 BENCHMARKS

Logic2Text

…The logical forms show diversified graph structure of free schema, which poses great challenges on the model's ability to understand the semantics.

8 PAPERS • NO BENCHMARKS YET

PubMedCite

PubMedCite is a domain-specific dataset with about 192K biomedical scientific papers and a large citation graph preserving 917K citation relationships between them.

3 PAPERS • NO BENCHMARKS YET

QAMPARI

…It was created by (a) generating questions with multiple answers from Wikipedia's knowledge graph and tables, (b) automatically pairing answers with supporting evidence in Wikipedia paragraphs, and (c)

11 PAPERS • NO BENCHMARKS YET

RoMQA

…RoMQA contains clusters of questions that are derived from related constraints mined from the Wikidata knowledge graph.

2 PAPERS • NO BENCHMARKS YET

TREx-2p

…It has been built by manually examining the 2-hop link existing in the knowledge graph of TREx-1p, and select eight 2- hop relation types that make sense to humans

1 PAPER • NO BENCHMARKS YET

VANiLLa

VANiLLa is a dataset for Question Answering over Knowledge Graphs (KGQA) offering answers in natural language sentences.

4 PAPERS • NO BENCHMARKS YET

JerichoWorld

JerichoWorld is a dataset that enables the creation of learning agents that can build knowledge graph-based world models of interactive narratives. JerichoWorld provides 24,198 mappings between rich natural language observations and: (1) knowledge graphs that reflect the world state in the form of a map; (2) natural language actions that are guaranteed

5 PAPERS • NO BENCHMARKS YET

WikiHop

…A bipartite graph connecting entities and documents is first built and the answer for each query is located by traversal on this graph.

67 PAPERS • 2 BENCHMARKS

AMR Bank (Abstract Meaning Representation)

…Each AMR is a single rooted, directed graph. AMRs include PropBank semantic roles, within-sentence coreference, named entities and types, modality, negation, questions, quantities, and so on.

22 PAPERS • 1 BENCHMARK

LLM Generated Spear Phishing Emails

This dataset comprises high-quality, targeted spear-phishing emails created using a proprietary system that harnesses the power of LLMs and knowledge graphs.

1 PAPER • NO BENCHMARKS YET

SOMD

SOMD (SOftware Mention Detection)

…The data is derived from the SoMeSci Knowledge Graph of software mentions. Subtask 1 deals with the recognition of software mentions and the classification of mention (e.g. Krüger, “SoMeSci—A 5 Star Open Data Gold Standard Knowledge Graph of Software Mentions in Scientific Articles,” in Proceedings of the 30th ACM International Conference on Information and Knowledge Management

1 PAPER • NO BENCHMARKS YET

InferWiki

InferWiki is a Knowledge Graph Completion (KGC) dataset that improves upon existing benchmarks in inferential ability, assumptions, and patterns.

4 PAPERS • NO BENCHMARKS YET

ChEBI-20

…Given a text query and list of molecules without any reference textual information (represented, for example, as SMILES strings, graphs, or other equivalent representations) retrieve the molecule corresponding This requires the integration of two very different types of information: the structured knowledge represented by text and the chemical properties present in molecular graphs.

22 PAPERS • 4 BENCHMARKS

ConvQuestions

ConvQuestions is the first realistic benchmark for conversational question answering over knowledge graphs. It contains 11,200 conversations which can be evaluated over Wikidata. For suitability to knowledge graphs, questions were constrained to be objective or factoid in nature, but no other restrictive guidelines were set.

15 PAPERS • NO BENCHMARKS YET

TextWorld KG

TextWorld KG is a dynamic Knowledge Graph (KG) extraction dataset. It is based on a set of text-based games generated using.

1 PAPER • NO BENCHMARKS YET

DBLP-QuAD (DBLP Question Answering Dataset)

In this work we create a question answering dataset over the DBLP scholarly knowledge graph (KG).

6 PAPERS • NO BENCHMARKS YET

JUSThink Dialogue and Actions Corpus

…and test responses of children aged 9 through 12, as they participate in a robot-mediated human-human collaborative learning activity named JUSThink, where children in teams of two solve a problem on graphs

1 PAPER • NO BENCHMARKS YET

ParaQA

ParaQA is a question answering (QA) dataset with multiple paraphrased responses for single-turn conversation over knowledge graphs (KG).

5 PAPERS • NO BENCHMARKS YET

FreebaseQA

FreebaseQA is a data set for open-domain QA over the Freebase knowledge graph.

14 PAPERS • NO BENCHMARKS YET

KAMEL

KAMEL (Knowledge Analysis with Multitoken Entities in Language Models)

…Most importantly we overcome the limitations of existing probing datasets by (1) having a larger variety of knowledge graph relations, (2) it contains single- and multi-token entities, (3) we use relations

5 PAPERS • 1 BENCHMARK

TRIPOD (TuRnIng POint Dataset)

…TRIPOD is extended in Movie Summarization via Sparse Graph Construction with more movies in the test set (122 now in total) and multimodal features extracted from the full-length movie videos.

11 PAPERS • NO BENCHMARKS YET

COPA-SSE

…With their familiar format, the explanations are geared towards commonsense reasoners operating on knowledge graphs and serve as a starting point for ongoing work on improving such systems.

3 PAPERS • NO BENCHMARKS YET

LDC2017T10

LDC2017T10 (Abstract Meaning Representation (AMR) Annotation Release 2.0)

…Each sentence is paired with a graph that represents its whole-sentence meaning in a tree-structure.

27 PAPERS • 2 BENCHMARKS

ConQA (Conceptual Query Answering)

…Filtering images The first step is focused on filtering images that have meaningful scene graphs and captions. We filtered all the scene graphs that did not contain any edges. images pass this filter. The relationships should be verbs and not contain nouns or pronouns. We filter all scene graphs that contain an edge not tagged as a verb or that the tag is not in an ad-hoc list of allowed non-verb keywords.

1 PAPER • 2 BENCHMARKS

LDC2020T02

LDC2020T02 (Abstract Meaning Representation (AMR) Annotation Release 3.0)

…Each sentence is paired with a graph that represents its whole-sentence meaning in a tree-structure.

9 PAPERS • 2 BENCHMARKS

MuMiN

MuMiN is a misinformation graph dataset containing rich social media data (tweets, replies, users, images, articles, hashtags), spanning 21 million tweets belonging to 26 thousand Twitter threads, each

4 PAPERS • 3 BENCHMARKS

AlgoPuzzleVQA

…We create the puzzles to encompass a diverse array of mathematical and algorithmic topics such as boolean logic, combinatorics, graph theory, optimization, search, etc., aiming to evaluate the gap between

1 PAPER • 1 BENCHMARK

QALD-9-Plus

QALD-9-Plus Dataset Description QALD-9-Plus is the dataset for Knowledge Graph Question Answering (KGQA) based on well-known QALD-9.

1 PAPER • 1 BENCHMARK

Study data

Challenges in Migrating Imperative Deep Learning Programs to Graph Execution: An Empirical Study File Descriptions File | Description --- | --- commit_categorizations.csv | Categorizations for the commits

2 PAPERS • NO BENCHMARKS YET

FMC-MWO2KG

FMC-MWO2KG (The MWO2KG Failure Mode Classification Dataset)

The Failure Mode Classification dataset released in the paper "MWO2KG and Echidna: Constructing and exploring knowledge graphs from maintenance data" by Stewart et al.

1 PAPER • 1 BENCHMARK