MuCGEC is a multi-reference multi-source evaluation dataset for Chinese Grammatical Error Correction (CGEC), consisting of 7,063 sentences collected from three different Chinese-as-a-Second-Language (CSL) learner sources. Each sentence has been corrected by three annotators, and their corrections are meticulously reviewed by an expert, resulting in 2.3 references per sentence.
14 PAPERS • 1 BENCHMARK
a fine-grained corpus to detect, identify and correct the chinese grammatical errors. collected mainly from multi-choice questions in public school Chinese examinations with multiple references Online Evaluation Site for test set: https://codalab.lisn.upsaclay.fr/competitions/8020
1 PAPER • 1 BENCHMARK
NaSGEC is a new dataset to facilitate research on Chinese grammatical error correction (CGEC) for native speaker texts from multiple domains. Previous CGEC research primarily focuses on correcting texts from a single domain, especially learner essays.
1 PAPER • NO BENCHMARKS YET