GECToR -- Grammatical Error Correction: Tag, Not Rewrite

In this paper, we present a simple and efficient GEC sequence tagger using a Transformer encoder. Our system is pre-trained on synthetic data and then fine-tuned in two stages: first on errorful corpora, and second on a combination of errorful and error-free parallel corpora. We design custom token-level transformations to map input tokens to target corrections. Our best single-model/ensemble GEC tagger achieves an $F_{0.5}$ of 65.3/66.5 on CoNLL-2014 (test) and $F_{0.5}$ of 72.4/73.6 on BEA-2019 (test). Its inference speed is up to 10 times as fast as a Transformer-based seq2seq GEC system. The code and trained models are publicly available.

PDF Abstract WS 2020 PDF WS 2020 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Grammatical Error Correction BEA-2019 (test) Sequence tagging + token-level transformations + two-stage fine-tuning (+BERT, RoBERTa, XLNet) F0.5 73.6 # 1
Grammatical Error Correction BEA-2019 (test) Sequence tagging + token-level transformations + two-stage fine-tuning (+XLNet) F0.5 72.4 # 4
Grammatical Error Correction CoNLL-2014 Shared Task Sequence tagging + token-level transformations + two-stage fine-tuning (+BERT, RoBERTa, XLNet) F0.5 66.5 # 2
Precision 78.2 # 1
Recall 41.5 # 2
Grammatical Error Correction CoNLL-2014 Shared Task Sequence tagging + token-level transformations + two-stage fine-tuning (+XLNet) F0.5 65.3 # 4
Precision 77.5 # 2
Recall 40.1 # 3

Methods