Developing NLP Tools with a New Corpus of Learner Spanish

The development of effective NLP tools for the L2 classroom depends largely on the availability of large annotated corpora of language learner text. While annotated learner corpora of English are widely available, large learner corpora of Spanish are less common. Those Spanish corpora that are available do not contain the annotations needed to facilitate the development of tools beneficial to language learners, such as grammatical error correction. As a result, the field has seen little research in NLP tools designed to benefit Spanish language learners and teachers. We introduce COWS-L2H, a freely available corpus of Spanish learner data which includes error annotations and parallel corrected text to help researchers better understand L2 development, to examine teaching practices empirically, and to develop NLP tools to better serve the Spanish teaching community. We demonstrate the utility of this corpus by developing a neural-network based grammatical error correction system for Spanish learner writing.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here