3 dataset results for Grammatical Error Correction AND Chinese

MuCGEC (Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction)

MuCGEC is a multi-reference multi-source evaluation dataset for Chinese Grammatical Error Correction (CGEC), consisting of 7,063 sentences collected from three different Chinese-as-a-Second-Language (CSL) learner sources. Each sentence has been corrected by three annotators, and their corrections are meticulously reviewed by an expert, resulting in 2.3 references per sentence.

14 PAPERS • 1 BENCHMARK

FCGEC (FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction)

a fine-grained corpus to detect, identify and correct the chinese grammatical errors. collected mainly from multi-choice questions in public school Chinese examinations with multiple references Online Evaluation Site for test set: https://codalab.lisn.upsaclay.fr/competitions/8020

1 PAPER • 1 BENCHMARK

NaSGEC

NaSGEC is a new dataset to facilitate research on Chinese grammatical error correction (CGEC) for native speaker texts from multiple domains. Previous CGEC research primarily focuses on correcting texts from a single domain, especially learner essays.

1 PAPER • NO BENCHMARKS YET

Datasets

3 dataset results for Grammatical Error Correction AND Chinese