Pillars of Grammatical Error Correction: Comprehensive Inspection Of Contemporary Approaches In The Era of Large Language Models

In this paper, we carry out experimental research on Grammatical Error Correction, delving into the nuances of single-model systems, comparing the efficiency of ensembling and ranking methods, and exploring the application of large language models to GEC as single-model systems, as parts of ensembles, and as ranking methods. We set new state-of-the-art performance with F_0.5 scores of 72.8 on CoNLL-2014-test and 81.4 on BEA-test, respectively. To support further advancements in GEC and ensure the reproducibility of our research, we make our code, trained models, and systems' outputs publicly available.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Grammatical Error Correction BEA-2019 (test) Majority-voting ensemble on best 7 models F0.5 81.4 # 1
Grammatical Error Correction CoNLL-2014 Shared Task Ensembles of best 7 models + GRECO + GTP-rerank F0.5 72.8 # 1
Precision 83.9 # 1
Recall 47.5 # 3
Grammatical Error Correction CoNLL-2014 Shared Task Majority-voting ensemble on best 7 models F0.5 71.8 # 2
Precision 83.7 # 2
Recall 45.7 # 5

Methods


No methods listed for this paper. Add relevant methods here