Grammatical Error Correction in Low-Resource Scenarios

WS 2019  ·  Jakub Náplava, Milan Straka ·

Grammatical error correction in English is a long studied problem with many existing systems and datasets. However, there has been only a limited research on error correction of other languages. In this paper, we present a new dataset AKCES-GEC on grammatical error correction for Czech. We then make experiments on Czech, German and Russian and show that when utilizing synthetic parallel corpus, Transformer neural machine translation model can reach new state-of-the-art results on these datasets. AKCES-GEC is published under CC BY-NC-SA 4.0 license at https://hdl.handle.net/11234/1-3057 and the source code of the GEC model is available at https://github.com/ufal/low-resource-gec-wnut2019.

PDF Abstract WS 2019 PDF WS 2019 Abstract

Datasets


Introduced in the Paper:

AKCES-GEC

Used in the Paper:

FCE

Results from the Paper


Ranked #2 on Grammatical Error Correction on Falko-MERLIN (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Grammatical Error Correction Falko-MERLIN Transformer F0.5 73.71 # 2
Grammatical Error Correction Falko-MERLIN Transformer - synthetic pretrain only F0.5 51.41 # 3

Methods