BioRED: A Rich Biomedical Relation Extraction Dataset

8 Apr 2022  ·  Ling Luo, Po-Ting Lai, Chih-Hsuan Wei, Cecilia N Arighi, Zhiyong Lu ·

Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for bio-medical RE only focus on relations of a single type (e.g., protein-protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then we present BioRED, a first-of-its-kind biomedical RE corpus with multiple entity types (e.g., gene/protein, disease, chemical) and relation pairs (e.g., gene-disease; chemical-chemical) at the document level, on a set of 600 PubMed abstracts. Further, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including BERT-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task (F-score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations (F-score of 47.7%). Our experiments also demonstrate that such a rich dataset can successfully facilitate the development of more accurate, efficient, and robust RE systems for biomedicine. The BioRED dataset and annotation guideline are freely available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/.

PDF Abstract

Datasets


Introduced in the Paper:

BioRED

Used in the Paper:

BC5CDR NCBI Disease

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Named Entity Recognition (NER) BioRED PubMedBERT-CRF F1 89.3 # 1
Named Entity Recognition (NER) BioRED BiLSTM-CRF F1 87.1 # 3
Named Entity Recognition (NER) BioRED BioBERT-CRF F1 88.7 # 2
Relation Extraction BioRED PubMedBERT F1 58.9 # 1
Binary Relation Extraction BioRED PubMedBERT F1 72.9 # 1

Methods


No methods listed for this paper. Add relevant methods here