The LTI LangID Corpus is a dataset used for language identification (LangID) tasks. It contains text data in various languages. The dataset has had multiple releases, with the first release containing 781 "core" languages and 1091 languages overall.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages