SciBERT: A Pretrained Language Model for Scientific Text

IJCNLP 2019  ·  Iz Beltagy, Kyle Lo, Arman Cohan ·

Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging and expensive. We release SciBERT, a pretrained language model based on BERT (Devlin et al., 2018) to address the lack of high-quality, large-scale labeled scientific data. SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks. We evaluate on a suite of tasks including sequence tagging, sentence classification and dependency parsing, with datasets from a variety of scientific domains. We demonstrate statistically significant improvements over BERT and achieve new state-of-the-art results on several of these tasks. The code and pretrained models are available at https://github.com/allenai/scibert/.

PDF Abstract IJCNLP 2019 PDF IJCNLP 2019 Abstract

Results from the Paper


 Ranked #1 on Sentence Classification on Paper Field (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Sentence Classification ACL-ARC SciBERT F1 70.98 # 2
Named Entity Recognition BC5CDR SciBERT (Base Vocab) F1 88.11 # 8
Named Entity Recognition BC5CDR SciBERT (SciVocab) F1 88.94 # 7
Relation Extraction ChemProt SciBERT (Base Vocab) F1 73.7 # 9
Relation Extraction ChemProt SciBert (Finetune) F1 83.64 # 2
Participant Intervention Comparison Outcome Extraction EBM-NLP SciBERT (Base Vocab) F1 70.82 # 3
Participant Intervention Comparison Outcome Extraction EBM-NLP SciBERT (SciVocab) F1 71.18 # 2
Dependency Parsing GENIA - LAS SciBERT (Base Vocab) F1 91.26 # 3
Dependency Parsing GENIA - LAS SciBERT (SciVocab) F1 91.41 # 2
Dependency Parsing GENIA - UAS SciBERT (Base Vocab) F1 92.32 # 3
Dependency Parsing GENIA - UAS SciBERT (SciVocab) F1 92.46 # 2
Relation Extraction JNLPBA SciBERT (SciVocab) F1 76.09 # 1
Named Entity Recognition JNLPBA SciBERT (Base Vocab) F1 75.77 # 9
Named Entity Recognition NCBI-disease SciBERT (SciVocab) F1 86.45 # 14
Named Entity Recognition NCBI-disease SciBERT (Base Vocab) F1 86.88 # 13
Sentence Classification Paper Field SciBERT (SciVocab) F1 65.71 # 1
Sentence Classification Paper Field SciBERT (Base Vocab) F1 64.02 # 2
Sentence Classification PubMed 20k RCT SciBERT (Base Vocab) F1 86.81 # 2
Sentence Classification SciCite SciBERT F1 84.9 # 1
Citation Intent Classification SciCite SciBERT F1 84.99 # 1
Sentence Classification ScienceCite SciBERT (SciVocab) F1 84.99 # 1
Sentence Classification ScienceCite SciBERT (Base Vocab) F1 84.43 # 2
Relation Extraction SciERC SciBERT (Base Vocab) F1 74.42 # 2
Relation Extraction SciERC SciBERT (SciVocab) F1 74.64 # 1
Named Entity Recognition SciERC SciBERT (Base Vocab) F1 65.24 # 5
Named Entity Recognition SciERC SciBERT (SciVocab) F1 67.57 # 4

Methods