Dataset: Relationship extraction for knowledge graph creation from biomedical literature (Gene-Disease relationships)

This is the dataset used for classifying Gene-Disease relationship types from sentences. The dataset consists of 3 files:

  • manually_annotated_set.xlsx - set of 2000 manualy annotated sentences with entities
  • Unbalanced_dataset.xlsx - set of 12000 sentences, out of which 2000 are from the first set, manually annotated, and the rest have been added using rule based method by adding sentences where extraction had confidence 1.
  • Balanced_dataset_SUB_PRED.xlsx - balanced dataset generated by taking 2000 manually annotated sentences, but then adding sentences from the rule-based method with confidence 1 in such a way that each relationship class had at least 1400 sentences (for biomarkers, we could obtain 1243 sentences with confidence 1 from a processed portion of the data we had at the time of building the dataset).

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Modalities


Languages