Dataset: Relationship extraction for knowledge graph creation from biomedical literature (Gene-Disease relationships)

This is the dataset used for classifying Gene-Disease relationship types from sentences. The dataset consists of 3 files:

manually_annotated_set.xlsx - set of 2000 manualy annotated sentences with entities
Unbalanced_dataset.xlsx - set of 12000 sentences, out of which 2000 are from the first set, manually annotated, and the rest have been added using rule based method by adding sentences where extraction had confidence 1.
Balanced_dataset_SUB_PRED.xlsx - balanced dataset generated by taking 2000 manually annotated sentences, but then adding sentences from the rule-based method with confidence 1 in such a way that each relationship class had at least 1400 sentences (for biomarkers, we could obtain 1243 sentences with confidence 1 from a processed portion of the data we had at the time of building the dataset).

Homepage

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Relation Extraction	Dataset: Relationship extraction for knowledge graph creation from biomedical literature (Gene-Disease relationships)	DistilBERT

Paper	Code	Results	Date	Stars

No data loaders found. You can submit your data loader here.