italki NLI

Introduced by Hudson et al. in On the Development of a Large Scale Corpus for Native Language Identification

A large, crowd-sourced dataset for the Native Language Identification (NLI) task. People learning English as a second language write practice Notebooks which can be used to classify the author's native language using word choice, spelling mistakes and other language features.

The dataset has: