Using Wikipedia Edits in Low Resource Grammatical Error Correction

WS 2018  ·  Adriane Boyd ·

We develop a grammatical error correction (GEC) system for German using a small gold GEC corpus augmented with edits extracted from Wikipedia revision history. We extend the automatic error annotation tool ERRANT (Bryant et al., 2017) for German and use it to analyze both gold GEC corrections and Wikipedia edits (Grundkiewicz and Junczys-Dowmunt, 2014) in order to select as additional training data Wikipedia edits containing grammatical corrections similar to those in the gold corpus. Using a multilayer convolutional encoder-decoder neural network GEC approach (Chollampatt and Ng, 2018), we evaluate the contribution of Wikipedia edits and find that carefully selected Wikipedia edits increase performance by over 5{\%}.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


Ranked #4 on Grammatical Error Correction on Falko-MERLIN (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Grammatical Error Correction Falko-MERLIN Multilayer Convolutional Encoder-Decoder F0.5 43.35 # 4

Methods


No methods listed for this paper. Add relevant methods here