A large, crowd-sourced dataset for the Native Language Identification (NLI) task. People learning English as a second language write practice Notebooks which can be used to classify the author's native language using word choice, spelling mistakes and other language features.

The dataset has:

  • 11 languages (Arabic, Chinese, French, German, Hindi, Italian, Japanese, Korean, Spanish, Telagu, Turkish)
  • 111,917 documents

Papers


Paper Code Results Date Stars

Tasks


License


  • Unknown

Modalities


Languages