TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Natural Language Inference	MedNLI	BioELECTRA-Base	Accuracy	86.34	# 2
Question Answering	PubMedQA	BioELECTRA uncased	Accuracy	64.2	# 21
Medical Named Entity Recognition	ShARe/CLEF eHealth corpus	BioELECTRA	F1	0.8371	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bioelectra-pretrained-biomedical-text-encoder/medical-named-entity-recognition-on-share)](https://paperswithcode.com/sota/medical-named-entity-recognition-on-share?p=bioelectra-pretrained-biomedical-text-encoder)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bioelectra-pretrained-biomedical-text-encoder/natural-language-inference-on-mednli)](https://paperswithcode.com/sota/natural-language-inference-on-mednli?p=bioelectra-pretrained-biomedical-text-encoder)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bioelectra-pretrained-biomedical-text-encoder/question-answering-on-pubmedqa)](https://paperswithcode.com/sota/question-answering-on-pubmedqa?p=bioelectra-pretrained-biomedical-text-encoder)`

BioELECTRA:Pretrained Biomedical text Encoder using Discriminators

ACL Anthology 2021 · Kamal raj Kanakarajan, Bhuvana Kundumani, Malaikannan Sankarasubbu ·

Recent advancements in pretraining strategies in NLP have shown a significant improvement in the performance of models on various text mining tasks. We apply ‘replaced token detection’ pretraining technique proposed by ELECTRA and pretrain a biomedical language model from scratch using biomedical text and vocabulary. We introduce BioELECTRA, a biomedical domain-specific language encoder model that adapts ELECTRA for the Biomedical domain. WE evaluate our model on the BLURB and BLUE biomedical NLP benchmarks. BioELECTRA outperforms the previous models and achieves state of the art (SOTA) on all the 13 datasets in BLURB benchmark and on all the 4 Clinical datasets from BLUE Benchmark across 7 different NLP tasks. BioELECTRA pretrained on PubMed and PMC full text articles performs very well on Clinical datasets as well. BioELECTRA achieves new SOTA 86.34%(1.39% accuracy improvement) on MedNLI and 64% (2.98% accuracy improvement) on PubMedQA dataset.

PDF Abstract

Code

Add Remove Mark official

kamalkraj/BioELECTRA

Tasks

Add Remove

Language Modelling

Medical Named Entity Recognition

Natural Language Inference

Question Answering

Sentence Similarity

Datasets

SQuAD

BC5CDR

BioASQ NCBI Disease

PubMedQA BLUE

DDI

BIOSSES

BLURB

HOC ChemProt

MedNLI PubMed PICO Element Detection Dataset

Results from the Paper

Add Remove

Ranked #1 on Medical Named Entity Recognition on ShARe/CLEF eHealth corpus

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Natural Language Inference	MedNLI	BioELECTRA-Base	Accuracy	86.34	# 2	Compare
Question Answering	PubMedQA	BioELECTRA uncased	Accuracy	64.2	# 21	Compare
Medical Named Entity Recognition	ShARe/CLEF eHealth corpus	BioELECTRA	F1	0.8371	# 1	Compare

Methods

Add Remove

Adam • Attention Dropout • Dense Connections • Dropout • ELECTRA • GELU • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Weight Decay • WordPiece

Edit Social Preview

BioELECTRA:Pretrained Biomedical text Encoder using Discriminators

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove