Datasets

8,877 machine learning datasets
Filter by Language (clear)
Udmurt English 2616 Chinese 304 German 167 French 142 Spanish 117 Russian 111 Arabic 79 Japanese 79 Italian 78 Portuguese 72 Hindi 64 Korean 54 Turkish 50 Vietnamese 45 Bengali 43 Persian 42 Dutch 41 Tamil 40 Polish 37 Czech 35 Indonesian 33 Danish 31 Telugu 30 Finnish 29 Romanian 29 Multilingual 28 Urdu 26 Marathi 25 Hungarian 23 Malayalam 22 Swedish 22 Thai 22 Greek 21 Estonian 20 Hebrew 20 Mandarin Chinese 20 Basque 19 Bulgarian 19 Gujarati 18 Kannada 17 Punjabi 17 Slovak 17 Ukrainian 16 Croatian 15 Norwegian 15 Slovenian 15 Catalan 14 Latvian 14 Lithuanian 13 Swahili 13 Assamese 12 Kazakh 12 Serbian 12 Amharic 11 Albanian 10 Armenian 10 Iranian Persian 10 Kurdish 10 Irish 9 Maltese 9 Oriya (macrolanguage) 9 Sinhala 9 Tagalog 9 Welsh 9 American Sign Language 8 Breton 8 Georgian 8 Icelandic 8 Macedonian 8 Mongolian 8 Sanskrit 8 Yoruba 8 Afrikaans 7 Azerbaijani 7 Burmese 7 Esperanto 7 Galician 7 Hausa 7 Igbo 7 Odia 7 Uzbek 7 Central Kurdish 6 Malay (individual language) 6 Oromo 6 Serbo-Croatian 6 Sindhi 6 Somali 6 Bambara 5 Belarusian 5 Egyptian Arabic 5 Filipino 5 Guarani 5 Haitian 5 Javanese 5 Latin 5 Malagasy 5 Nepali (macrolanguage) 5 Norwegian Bokmål 5 Norwegian Nynorsk 5 Quechua 5 Scottish Gaelic 5 Standard Arabic 5 Sundanese 5 Tigrinya 5 Wolof 5 Bosnian 4 Cebuano 4 Central Khmer 4 Chechen 4 Dhivehi 4 Fulah 4 Ganda 4 Iloko 4 Kabyle 4 Kinyarwanda 4 Kirghiz 4 Lao 4 Lingala 4 Nigerian Pidgin 4 South Azerbaijani 4 Tatar 4 Tibetan 4 Upper Sorbian 4 Western Panjabi 4 Aragonese 3 Bashkir 3 Bavarian 3 Bishnupriya 3 Chuvash 3 Erzya 3 Faroese 3 Fon 3 German Sign Language 3 Goan Konkani 3 Maithili 3 Malay (macrolanguage) 3 Nyanja 3 Romansh 3 Russia Buriat 3 Swati 3 Swiss German 3 Tajik 3 Tsonga 3 Tswana 3 Twi 3 Uighur 3 Waray (Philippines) 3 Xhosa 3 Yiddish 3 Argentine Sign Language 2 Asturian 2 Avaric 2 Aymara 2 Bangala 2 Bhojpuri 2 Central Bikol 2 Cherokee 2 Church Slavic 2 Cornish 2 Dimli (individual language) 2 Eastern Mari 2 Ewe 2 Gothic 2 Gulf Arabic 2 Ido 2 Interlingue 2 Inuktitut 2 Jejueo 2 Kalaallisut 2 Kalmyk 2 Karachay-Balkar 2 Komi 2 Komi-Permyak 2 Lezghian 2 Limburgan 2 Livvi 2 Lojban 2 Lombard 2 Low German 2 Lower Sorbian 2 Luo (Kenya and Tanzania) 2 Luxembourgish 2 Manipuri 2 Manx 2 Maori 2 Mazanderani 2 Minangkabau 2 Mingrelian 2 Mirandese 2 Modern Greek 2 Moksha 2 Moroccan Arabic 2 Naxi 2 Neapolitan 2 Nepali (individual language) 2 Newari 2 Northern Frisian 2 Northern Kurdish 2 Northern Luri 2 Northern Sami 2 Occitan (post 1500) 2 Ossetian 2 Pampanga 2 Pedi 2 Piemontese 2 Pushto 2 Rundi 2 Sardinian 2 Shona 2 Sichuan Yi 2 Sicilian 2 Southern Sotho 2 Swiss-German Sign Language 2 Tai 2 Tosk Albanian 2 Turkish Sign Language 2 Turkmen 2 Tuvinian 2 Venetian 2 Volapük 2 Walloon 2 Western Frisian 2 Western Mari 2 Wu Chinese 2 Yakut 2 Yue Chinese 2 Zulu 2 Abkhazian 1 Achinese 1 Adyghe 1 Afar 1 Akan 1 Akkadian 1 Akuntsu 1 Ancient Greek 1 Ancient Hebrew 1 Apurinã 1 Arpitan 1 Assyrian Neo-Aramaic 1 Bangladeshi Sign Language 1 Banjar 1 Bemba (Zambia) 1 Bislama 1 Bodo (India) 1 Buginese 1 Central Pashto 1 Chamorro 1 Chavacano 1 Cheyenne 1 Choctaw 1 Chukot 1 Congo Swahili 1 Coptic 1 Corsican 1 Cree 1 Creek 1 Crimean Tatar 1 Dogri (macrolanguage) 1 Dzongkha 1 Extremaduran 1 Fiji Hindi 1 Fijian 1 Friulian 1 Gagauz 1 Gan Chinese 1 Geez 1 Gilaki 1 Greek Sign Language 1 Hakha Chin 1 Hakka Chinese 1 Halh Mongolian 1 Hawaiian 1 Herero 1 Hiri Motu 1 Interlingua (International Auxiliary Language Association) 1 Inupiaq 1 Jamaican Creole English 1 Kabardian 1 Kabuverdianu 1 Kachin 1 Kanuri 1 Kara-Kalpak 1 Karelian 1 Kashmiri 1 Kashubian 1 Khunsari 1 Kikuyu 1 Komi-Zyrian 1 Kongo 1 Krio 1 Kuanyama 1 Kölsch 1 Ladino 1 Lak 1 Latgalian 1 Ligurian 1 Literary Chinese 1 Lozi 1 Lunda 1 Luo (Cameroon) 1 Lushai 1 Marshallese 1 Mbyá Guaraní 1 Mesopotamian Arabic 1 Min Dong Chinese 1 Modern Greek (1453-) 1 Mundurukú 1 Najdi Arabic 1 Narom 1 Nauru 1 Navajo 1 Nayini 1 Ndonga 1 Nigerian Fulfulde 1 North Azerbaijani 1 North Levantine Arabic 1 Northern Uzbek 1 Novial 1 Official Aramaic (700-300 BCE) 1 Old English (ca. 450-1100) 1 Old French 1 Old Russian 1 Old Turkish 1 Pali 1 Pangasinan 1 Papiamento 1 Pennsylvania German 1 Pfaelzisch 1 Picard 1 Pitcairn-Norfolk 1 Plateau Malagasy 1 Pontic 1 Rajasthani 1 Rusyn 1 Samoan 1 Sango 1 Saterfriesisch 1 Scots 1 Shan 1 Silesian 1 Skolt Sami 1 Soi 1 South Levantine Arabic 1 Southern Pashto 1 Sranan Tongo 1 Standard Latvian 1 Swahili (macrolanguage) 1 Swedish Sign Language 1 Tahitian 1 Tetum 1 Tok Pisin 1 Tonga (Tonga Islands) 1 Tonga (Zambia) 1 Tulu 1 Tumbuka 1 Tunisian Arabic 1 Tupinambá 1 Venda 1 Veps 1 Vlaams 1 Vlax Romani 1 Votic 1 Warlpiri 1 West Central Oromo 1 Zaza 1 Zeeuws 1 Zhuang 1 Dogri (individual language) 0 Northern Huishui Hmong 0 Portuguse 0 Saidi Arabic 0 Santali 0