🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task (clear)

Filter by Language

23 dataset results for Relation Classification

TACRED (The TAC Relation Extraction Dataset)

TACRED is a large-scale relation extraction dataset with 106,264 examples built over newswire and web text from the corpus used in the yearly TAC Knowledge Base Population (TAC KBP) challenges. Examples in TACRED cover 41 relation types as used in the TAC KBP challenges (e.g., per:schools_attended and org:members) or are labeled as no_relation if no defined relation is held. These examples are created by combining available human annotations from the TAC KBP challenges and crowdsourcing.

185 PAPERS • 2 BENCHMARKS

FewRel (Few-Shot Relation Classification Dataset)

The FewRel (Few-Shot Relation Classification Dataset) contains 100 relations and 70,000 instances from Wikipedia. The dataset is divided into three subsets: training set (64 relations), validation set (16 relations) and test set (20 relations).

170 PAPERS • 4 BENCHMARKS

FewRel 2.0

A more challenging task to investigate two aspects of few-shot relation classification models: (1) Can they adapt to a new domain with only a handful of instances? (2) Can they detect none-of-the-above (NOTA) relations?

38 PAPERS • NO BENCHMARKS YET

MATRES

MATRES (Multi-Axis Temporal RElations for Start-points)

This is the Multi-Axis Temporal RElations for Start-points (i.e., MATRES) dataset

13 PAPERS • 2 BENCHMARKS

CDCP

CDCP (Cornell eRulemaking Corpus)

The Cornell eRulemaking Corpus – CDCP is an argument mining corpus annotated with argumentative structure information capturing the evaluability of arguments. The corpus consists of 731 user comments on Consumer Debt Collection Practices (CDCP) rule by the Consumer Financial Protection Bureau (CFPB); the resulting dataset contains 4931 elementary unit and 1221 support relation annotations. It is a resource for building argument mining systems that can not only extract arguments from unstructured text, but also identify what additional information is necessary for readers to understand and evaluate a given argument. Immediate applications include providing real-time feedback to commenters, specifying which types of support for which propositions can be added to construct better-formed arguments.

11 PAPERS • 3 BENCHMARKS

Discovery

The Discovery datasets consists of adjacent sentence pairs (s1,s2) with a discourse marker (y) that occurred at the beginning of s2. They were extracted from the depcc web corpus.

9 PAPERS • 1 BENCHMARK

GUM (Georgetown University Multilayer corpus)

GUM is an open source multilayer English corpus of richly annotated texts from twelve text types. Annotations include:

8 PAPERS • 1 BENCHMARK

RELX

RELX is a benchmark dataset for cross-lingual relation classification in English, French, German, Spanish and Turkish.

6 PAPERS • NO BENCHMARKS YET

LabPics (LabPics Dataset for computer vision for autonomous chemistry labs and medical labs)

LabPics Chemistry Dataset

5 PAPERS • NO BENCHMARKS YET

TACRED-Revisited

The TACRED-Revisited dataset improves the crowd-sourced TACRED dataset for relation extraction by relabeling the dev and test sets using expert linguistic annotators. Relabeling focuses on the 5K most challenging instances in dev and test, in total, 51.2% of these are corrected. Published at ACL 2020.

5 PAPERS • 1 BENCHMARK

CrossRE

CrossRE is a cross-domain benchmark for Relation Extraction (RE), which comprises six distinct text domains and includes multi-label annotations. The dataset includes meta-data collected during annotation, to include explanations and flags of difficult instances.

4 PAPERS • NO BENCHMARKS YET

DISRPT2021

DISRPT2021 (DISRPT2021 shared task on Discourse Unit Segmentation, Connective Detection and Discourse Relation Classification)

The DISRPT 2021 shared task, co-located with CODI 2021 at EMNLP, introduces the second iteration of a cross-formalism shared task on discourse unit segmentation and connective detection, as well as the first iteration of a cross-formalism discourse relation classification task.

3 PAPERS • NO BENCHMARKS YET

DRI Corpus

DRI Corpus (Dr. Inventor Multi-layer Scientific Corpus)

The Dr. Inventor Multi-Layer Scientific Corpus (DRI Corpus) includes 40 Computer Graphics papers, selected by domain experts. Each paper of the Corpus has been annotated by three annotators by providing the following layers of annotations, each one characterizing a core aspect of scientific publications:

2 PAPERS • 2 BENCHMARKS

FREDo

FREDo is a Few-Shot Document-Level Relation Extraction Benchmark based on DocRED and SciERC. The dataset is divided into four subsets: training set (62 relations), validation set (16 relations), in-domain test set (16 relations), and cross-domain test set (7 relations).

2 PAPERS • 2 BENCHMARKS

PcMSP

PcMSP is a dataset annotated from 305 open access scientific articles for material science information extraction that simultaneously contains the synthesis sentences extracted from the experimental paragraphs, as well as the entity mentions and intra-sentence relations.

2 PAPERS • NO BENCHMARKS YET

AbstRCT - Neoplasm

The AbstRCT dataset consists of randomized controlled trials retrieved from the MEDLINE database via PubMed search. The trials are annotated with argument components and argumentative relations.

1 PAPER • 2 BENCHMARKS

CORE

CORE (Company Relation Extraction)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 PAPER • NO BENCHMARKS YET

MultiTACRED

MultiTACRED is a multilingual version of the large-scale TAC Relation Extraction Dataset. It covers 12 typologically diverse languages from 9 language families, and was created by the Speech & Language Technology group of DFKI by machine-translating the instances of the original TACRED dataset and automatically projecting their entity annotations. For details of the original TACRED's data collection and annotation process, see the Stanford paper. Translations are syntactically validated by checking the correctness of the XML tag markup. Any translations with an invalid tag structure, e.g. missing or invalid head or tail tag pairs, are discarded (on average, 2.3% of the instances).

1 PAPER • NO BENCHMARKS YET

Relational Pattern Similarity Dataset

The relational pattern similarity dataset is a new dataset upon the work of Zeichner et al. (2012), which consists of relational patterns with semantic inference labels annotated. The dataset includes 5,555 pairs extracted by Reverb (Fader et al., 2011), 2,447 pairs with inference relation and 3,108 pairs (the rest) without one.

1 PAPER • NO BENCHMARKS YET

SupplyGraph (SupplyGraph: A Benchmark Dataset for Supply Chain Planning using Graph Neural Networks)

Graph Neural Networks (GNNs) have gained traction across different domains such as transportation, bio-informatics, language processing, and computer vision. However, there is a noticeable absence of research on applying GNNs to supply chain networks. Supply chain networks are inherently graphlike in structure, making them prime candidates for applying GNN methodologies. This opens up a world of possibilities for optimizing, predicting, and solving even the most complex supply chain problems. A major setback in this approach lies in the absence of real-world benchmark datasets to facilitate the research and resolution of supply chain problem using GNNs. To address the issue, we present a real-world benchmark dataset for temporal tasks, obtained from one of the leading FMCG companies in Bangladesh, focusing on supply chain planning for production purposes. The dataset includes temporal data as node features to enable sales predictions, production planning, and the identification of fact

1 PAPER • NO BENCHMARKS YET

TexRel

Green family of datasets for emergent communications on relations.

1 PAPER • NO BENCHMARKS YET

Translated TACRED

533 parallel examples sampled from TACRED, translated into Russian and Korean (and 3 additional examples in Russian), accompanied with tranlsation of a list of trigger words collected for the different relations.

1 PAPER • NO BENCHMARKS YET

TFH_Annotated_Dataset

TFH_Annotated_Dataset (Thin_Film_head_relevant_Patent_Annotated_Dataset)

Dataset Introduction TFH_Annotated_Dataset is an annotated patent dataset pertaining to thin film head technology in hard-disk. To the best of our knowledge, this is the second labeled patent dataset public available in technology management domain that annotates both entities and the semantic relations between entities, the first one is [1].

0 PAPER • NO BENCHMARKS YET

Datasets

23 dataset results for Relation Classification