Browse State-of-the-Art
Datasets
Methods
More
Newsletter
RC2022
About
Trends
Portals
Libraries
Sign In
Datasets
9,828
machine learning datasets
Subscribe to the PwC Newsletter
×
Stay informed on the latest trending ML papers with code, research developments, libraries, methods, and datasets.
Read previous issues
Join the community
×
You need to
log in
to edit.
You can
create a new account
if you don't have one.
🔔
Share your dataset with the ML community!
Filters
List
Gallery
Best match
Most cited
Newest
Filter by Modality
(clear)
Texts
×
3D
0
3d meshes
0
6D
0
Actions
0
Audio
0
Biology
0
Biomedical
0
Cad
0
Dialog
0
EEG
0
Environment
0
Financial
0
Graphs
0
Hyperspectral images
0
Images
0
Interactive
0
LiDAR
0
Lyrics
0
MRI
0
Medical
0
Midi
0
Music
0
PSG
0
Parallel
0
Physics
0
Point cloud
0
RGB Video
0
RGB-D
0
Ranking
0
Replay data
0
Speech
0
Stereo
0
Tables
0
Tabular
0
Time series
0
Tracking
0
Videos
0
fMRI
0
Filter by Task
(clear)
Czech Text Diacritization
×
Croatian Text Diacritization
1
French Text Diacritization
1
Hungarian Text Diacritization
1
Irish Text Diacritization
1
Latvian Text Diacritization
1
Polish Text Diacritization
1
Romanian Text Diacritization
1
Slovak Text Diacritization
1
Spanish Text Diacritization
1
Turkish Text Diacritization
1
Vietnamese Text Diacritization
1
Filter by Language
(clear)
Latvian
×
Croatian
1
Czech
1
French
1
Hungarian
1
Irish
1
Polish
1
Romanian
1
Slovak
1
Spanish
1
Turkish
1
Vietnamese
1
Abkhazian
0
Achinese
0
Adyghe
0
Afar
0
Afrikaans
0
Akan
0
Akkadian
0
Akuntsu
0
Albanian
0
American Sign Language
0
Amharic
0
Ancient Greek
0
Ancient Hebrew
0
Apurinã
0
Arabic
0
Aragonese
0
Argentine Sign Language
0
Armenian
0
Arpitan
0
Assamese
0
Assyrian Neo-Aramaic
0
Asturian
0
Avaric
0
Aymara
0
Azerbaijani
0
Bambara
0
Bangala
0
Bangladeshi Sign Language
0
Banjar
0
Bashkir
0
Basque
0
Bavarian
0
Belarusian
0
Bemba (Zambia)
0
Bengali
0
Bhojpuri
0
Bishnupriya
0
Bislama
0
Bodo (India)
0
Bosnian
0
Breton
0
Buginese
0
Bulgarian
0
Burmese
0
Catalan
0
Cebuano
0
Central Bikol
0
Central Khmer
0
Central Kurdish
0
Central Pashto
0
Chamorro
0
Chavacano
0
Chechen
0
Cherokee
0
Cheyenne
0
Chinese
0
Choctaw
0
Chukot
0
Church Slavic
0
Chuvash
0
Congo Swahili
0
Coptic
0
Cornish
0
Corsican
0
Cree
0
Creek
0
Crimean Tatar
0
Danish
0
Dhivehi
0
Dimli (individual language)
0
Dogri (individual language)
0
Dogri (macrolanguage)
0
Dutch
0
Dzongkha
0
Eastern Mari
0
Egyptian Arabic
0
English
0
Erzya
0
Esperanto
0
Estonian
0
Ewe
0
Extremaduran
0
Faroese
0
Fiji Hindi
0
Fijian
0
Filipino
0
Finnish
0
Fon
0
Friulian
0
Fulah
0
Gagauz
0
Galician
0
Gan Chinese
0
Ganda
0
Geez
0
Georgian
0
German
0
German Sign Language
0
Gilaki
0
Goan Konkani
0
Gothic
0
Greek
0
Greek Sign Language
0
Guarani
0
Gujarati
0
Gulf Arabic
0
Haitian
0
Hakha Chin
0
Hakka Chinese
0
Halh Mongolian
0
Hausa
0
Hawaiian
0
Hebrew
0
Herero
0
Hindi
0
Hiri Motu
0
Icelandic
0
Ido
0
Igbo
0
Iloko
0
Indonesian
0
Interlingua (International Auxiliary Language Association)
0
Interlingue
0
Inuktitut
0
Inupiaq
0
Iranian Persian
0
Italian
0
Jamaican Creole English
0
Japanese
0
Javanese
0
Jejueo
0
Kabardian
0
Kabuverdianu
0
Kabyle
0
Kachin
0
Kalaallisut
0
Kalmyk
0
Kannada
0
Kanuri
0
Kara-Kalpak
0
Karachay-Balkar
0
Karelian
0
Kashmiri
0
Kashubian
0
Kazakh
0
Khunsari
0
Kikuyu
0
Kinyarwanda
0
Kirghiz
0
Komi
0
Komi-Permyak
0
Komi-Zyrian
0
Kongo
0
Korean
0
Krio
0
Kuanyama
0
Kurdish
0
Kölsch
0
Ladino
0
Lak
0
Lao
0
Latgalian
0
Latin
0
Lezghian
0
Ligurian
0
Limburgan
0
Lingala
0
Literary Chinese
0
Lithuanian
0
Livvi
0
Lojban
0
Lombard
0
Low German
0
Lower Sorbian
0
Lozi
0
Lunda
0
Luo (Cameroon)
0
Luo (Kenya and Tanzania)
0
Lushai
0
Luxembourgish
0
Macedonian
0
Maithili
0
Malagasy
0
Malay (individual language)
0
Malay (macrolanguage)
0
Malayalam
0
Maltese
0
Mandarin Chinese
0
Manipuri
0
Manx
0
Maori
0
Marathi
0
Marshallese
0
Mazanderani
0
Mbyá Guaraní
0
Mesopotamian Arabic
0
Min Dong Chinese
0
Minangkabau
0
Mingrelian
0
Mirandese
0
Modern Greek
0
Modern Greek (1453-)
0
Moksha
0
Mongolian
0
Moroccan Arabic
0
Multilingual
0
Mundurukú
0
Najdi Arabic
0
Narom
0
Nauru
0
Navajo
0
Naxi
0
Nayini
0
Ndonga
0
Neapolitan
0
Nepali (individual language)
0
Nepali (macrolanguage)
0
Newari
0
Nigerian Fulfulde
0
Nigerian Pidgin
0
North Azerbaijani
0
North Levantine Arabic
0
Northern Frisian
0
Northern Huishui Hmong
0
Northern Kurdish
0
Northern Luri
0
Northern Sami
0
Northern Uzbek
0
Norwegian
0
Norwegian Bokmål
0
Norwegian Nynorsk
0
Novial
0
Nyanja
0
Occitan (post 1500)
0
Odia
0
Official Aramaic (700-300 BCE)
0
Old English (ca. 450-1100)
0
Old French
0
Old Russian
0
Old Turkish
0
Oriya (macrolanguage)
0
Oromo
0
Ossetian
0
Pali
0
Pampanga
0
Pangasinan
0
Papiamento
0
Pedi
0
Pennsylvania German
0
Persian
0
Pfaelzisch
0
Picard
0
Piemontese
0
Pitcairn-Norfolk
0
Plateau Malagasy
0
Pontic
0
Portuguese
0
Portuguse
0
Punjabi
0
Pushto
0
Quechua
0
Rajasthani
0
Romansh
0
Rundi
0
Russia Buriat
0
Russian
0
Rusyn
0
Saidi Arabic
0
Samoan
0
Sango
0
Sanskrit
0
Santali
0
Sardinian
0
Saterfriesisch
0
Scots
0
Scottish Gaelic
0
Serbian
0
Serbo-Croatian
0
Shan
0
Shona
0
Sichuan Yi
0
Sicilian
0
Silesian
0
Sindhi
0
Sinhala
0
Skolt Sami
0
Slovenian
0
Soi
0
Somali
0
South Azerbaijani
0
South Levantine Arabic
0
Southern Pashto
0
Southern Sotho
0
Sranan Tongo
0
Standard Arabic
0
Standard Latvian
0
Sundanese
0
Swahili
0
Swahili (macrolanguage)
0
Swati
0
Swedish
0
Swedish Sign Language
0
Swiss German
0
Swiss-German Sign Language
0
Tagalog
0
Tahitian
0
Tai
0
Tajik
0
Tamil
0
Tatar
0
Telugu
0
Tetum
0
Thai
0
Tibetan
0
Tigrinya
0
Tok Pisin
0
Tonga (Tonga Islands)
0
Tonga (Zambia)
0
Tosk Albanian
0
Tsonga
0
Tswana
0
Tulu
0
Tumbuka
0
Tunisian Arabic
0
Tupinambá
0
Turkish Sign Language
0
Turkmen
0
Tuvinian
0
Twi
0
Udmurt
0
Uighur
0
Ukrainian
0
Upper Sorbian
0
Urdu
0
Uzbek
0
Venda
0
Venetian
0
Veps
0
Vlaams
0
Vlax Romani
0
Volapük
0
Votic
0
Walloon
0
Waray (Philippines)
0
Warlpiri
0
Welsh
0
West Central Oromo
0
Western Frisian
0
Western Mari
0
Western Panjabi
0
Wolof
0
Wu Chinese
0
Xhosa
0
Yakut
0
Yiddish
0
Yoruba
0
Yue Chinese
0
Zaza
0
Zeeuws
0
Zhuang
0
Zulu
0
1 dataset result for
segmentation
AND
Czech Text Diacritization
AND
Texts
AND
Latvian
Multilingual Dataset for Training and Evaluating Diacritics Restoration Systems
Multilingual Dataset for Training and Evaluating Diacritics Restoration Systems
…Data are
segmented
into sentences which are further word tokenized.
2
PAPERS
• 12
BENCHMARKS