A Dataset for N-ary Relation Extraction of Drug Combinations

Combination therapies have become the standard of care for diseases such as cancer, tuberculosis, malaria and HIV. However, the combinatorial set of available multi-drug treatments creates a challenge in identifying effective combination therapies available in a situation. To assist medical professionals in identifying beneficial drug-combinations, we construct an expert-annotated dataset for extracting information about the efficacy of drug combinations from the scientific literature. Beyond its practical utility, the dataset also presents a unique NLP challenge, as the first relation extraction dataset consisting of variable-length relations. Furthermore, the relations in this dataset predominantly require language understanding beyond the sentence level, adding to the challenge of this task. We provide a promising baseline model and identify clear areas for further improvement. We release our dataset, code, and baseline models publicly to encourage the NLP community to participate in this task.

PDF Abstract NAACL 2022 PDF NAACL 2022 Abstract

Datasets


Introduced in the Paper:

Drug Combination Extraction Dataset
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Drug–drug Interaction Extraction Drug Combination Extraction Dataset PubmedBERT + PURE (domain-adapted) Exact Match F1 ("Any Combination") 69.4 # 2
Exact Match F1 ("Positive Combination") 61.8 # 2

Methods


No methods listed for this paper. Add relevant methods here