SciBERT: A Pretrained Language Model for Scientific Text

26 Mar 2019Iz BeltagyKyle LoArman Cohan

Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging and expensive. We release SciBERT, a pretrained language model based on BERT (Devlin et al., 2018) to address the lack of high-quality, large-scale labeled scientific data... (read more)

PDF Abstract

Evaluation results from the paper


 SOTA for Named Entity Recognition on NCBI-disease (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric name Metric value Global rank Uses extra
training data
Compare
Sentence Classification ACL-ARC SciBERT (SciVocab) F1 65.71 # 3
Sentence Classification ACL-ARC SciBERT (Base Vocab) F1 65.79 # 2
Citation Intent Classification ACL-ARC SciBERT F1 65.8 # 2
Named Entity Recognition BC5CDR SciBERT (SciVocab) F1 88.94 # 2
Named Entity Recognition BC5CDR SciBERT (Base Vocab) F1 88.11 # 3
Relation Extraction ChemProt SciBERT (Base Vocab) F1 73.70 # 2
Relation Extraction ChemProt SciBERT (SciVocab) F1 76.12 # 1
Participant Intervention Comparison Outcome Extraction EBM-NLP SciBERT (Base Vocab) F1 70.82 # 2
Participant Intervention Comparison Outcome Extraction EBM-NLP SciBERT (SciVocab) F1 71.18 # 1
Dependency Parsing GENIA - LAS SciBERT (SciVocab) F1 91.41 # 2
Dependency Parsing GENIA - LAS SciBERT (Base Vocab) F1 91.26 # 3
Dependency Parsing GENIA - UAS SciBERT (SciVocab) F1 92.46 # 2
Dependency Parsing GENIA - UAS SciBERT (Base Vocab) F1 92.32 # 3
Named Entity Recognition JNLPBA SciBERT (SciVocab) F1 75.95 # 3
Named Entity Recognition JNLPBA SciBERT (Base Vocab) F1 75.83 # 4
Named Entity Recognition NCBI-disease SciBERT (SciVocab) F1 86.45 # 3
Named Entity Recognition NCBI-disease SciBERT (Base Vocab) F1 86.91 # 2
Sentence Classification Paper Field SciBERT (Base Vocab) F1 64.02 # 2
Sentence Classification Paper Field SciBERT (SciVocab) F1 64.07 # 1
Sentence Classification PubMed 20k RCT SciBERT (Base Vocab) F1 86.80 # 3
Sentence Classification PubMed 20k RCT SciBERT (SciVocab) F1 86.81 # 2
Sentence Classification SciCite SciBERT F1 84.9 # 1
Citation Intent Classification SciCite SciBERT F1 84.99 # 1
Sentence Classification ScienceCite SciBERT (Base Vocab) F1 84.43 # 2
Sentence Classification ScienceCite SciBERT (SciVocab) F1 84.99 # 1
Relation Extraction SciERC SciBERT (SciVocab) F1 74.64 # 1
Relation Extraction SciERC SciBERT (Base Vocab) F1 74.42 # 2
Named Entity Recognition SciERC SciBERT (Base Vocab) F1 65.12 # 2
Named Entity Recognition SciERC SciBERT (SciVocab) F1 65.5 # 1