SciFive: a text-to-text transformer model for biomedical literature

In this report, we introduce SciFive, a domain-specific T5 model that has been pre-trained on large biomedical corpora. Our model outperforms the current SOTA methods (i.e. BERT, BioBERT, Base T5) on tasks in named entity relation, relation extraction, natural language inference, and question-answering. We show that text-generation methods have significant potential in a broad array of biomedical NLP tasks, particularly those requiring longer, more complex outputs. Our results support the exploration of more difficult text generation tasks and the development of new methods in this area

PDF Abstract

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Named Entity Recognition (NER) BC5CDR-chemical SciFive-Large F1 94.76 # 2
Named Entity Recognition (NER) BC5CDR-disease SciFive-Large F1 87.62 # 3
Relation Extraction ChemProt BioT5X (base) F1 77.40 # 6
Relation Extraction ChemProt SciFive Large F1 78 # 4
Drug–drug Interaction Extraction DDI extraction 2013 corpus SciFive-large F1 0.8367 # 2
Micro F1 83.67 # 2
Document Classification HOC SciFive-large F1 86.08 # 3
Named Entity Recognition (NER) JNLPBA SciFive-Large F1 77.55 # 14
Natural Language Inference MedNLI SciFive-large Accuracy 86.57 # 1
Named Entity Recognition (NER) NCBI-disease SciFive-Base F1 89.39 # 7
Named Entity Recognition (NER) Species-800 SciFive-Base F1 76.55 # 2