ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

ICLR 2020 Anonymous

While masked language modeling (MLM) pre-training methods such as BERT produce excellent results on downstream NLP tasks, they require large amounts of compute to be effective. These approaches corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK LEADERBOARD
Linguistic Acceptability CoLA ELECTRA Accuracy 68.2% # 5
Semantic Textual Similarity MRPC ELECTRA Accuracy 89.6% # 8
Natural Language Inference QNLI ELECTRA Accuracy 95.4% # 5
Question Answering Quora Question Pairs ELECTRA Accuracy 90.1% # 5
Natural Language Inference RTE ELECTRA Accuracy 83.6% # 7
Sentiment Analysis SST-2 Binary classification ELECTRA Accuracy 96.9 # 4
Semantic Textual Similarity STS Benchmark ELECTRA (no tricks) Pearson Correlation 0.910 # 5
Semantic Textual Similarity STS Benchmark ELECTRA Pearson Correlation 0.921 # 3