The LTI LangID Corpus is a dataset used for language identification (LangID) tasks. It contains text data in various languages. The dataset has had multiple releases, with the first release containing 781 "core" languages and 1091 languages overall.
Paper | Code | Results | Date | Stars |
---|