Comparison of Grammatical Error Correction Using Back-Translation Models

Grammatical error correction (GEC) suffers from a lack of sufficient parallel data. Therefore, GEC studies have developed various methods to generate pseudo data, which comprise pairs of grammatical and artificially produced ungrammatical sentences. Currently, a mainstream approach to generate pseudo data is back-translation (BT). Most previous GEC studies using BT have employed the same architecture for both GEC and BT models. However, GEC models have different correction tendencies depending on their architectures. Thus, in this study, we compare the correction tendencies of the GEC models trained on pseudo data generated by different BT models, namely, Transformer, CNN, and LSTM. The results confirm that the correction tendencies for each error type are different for every BT model. Additionally, we examine the correction tendencies when using a combination of pseudo data generated by different BT models. As a result, we find that the combination of different BT models improves or interpolates the F_0.5 scores of each error type compared with that of single BT models with different seeds.

PDF Abstract NAACL 2021 PDF NAACL 2021 Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods