FCE (First Certificate in English)

Introduced by Helen Yannakoudakis et al. in A New Dataset and Method for Automatically Grading ESOL Texts

The Cambridge Learner Corpus First Certificate in English (CLC FCE) dataset consists of short texts, written by learners of English as an additional language in response to exam prompts eliciting free-text answers and assessing mastery of the upper-intermediate proficiency level. The texts have been manually error-annotated using a taxonomy of 77 error types. The full dataset consists of 323,192 sentences. The publicly released subset of the dataset, named FCE-public, consists of 33,673 sentences split into test and training sets of 2,720 and 30,953 sentences, respectively.

Source: Compositional Sequence Labeling Models for Error Detection in Learner Writing


Paper Code Results Date Stars

Dataset Loaders

No data loaders found. You can submit your data loader here.


Similar Datasets