Browse State-of-the-Art
Datasets
Methods
More
Newsletter
RC2022
About
Trends
Portals
Libraries
Sign In
Datasets
9,824
machine learning datasets
Subscribe to the PwC Newsletter
×
Stay informed on the latest trending ML papers with code, research developments, libraries, methods, and datasets.
Read previous issues
Join the community
×
You need to
log in
to edit.
You can
create a new account
if you don't have one.
🔔
Share your dataset with the ML community!
Filters
List
Gallery
Best match
Most cited
Newest
Filter by Modality
(clear)
Texts
×
3D
0
3d meshes
0
6D
0
Actions
0
Audio
0
Biology
0
Biomedical
0
Cad
0
Dialog
0
EEG
0
Environment
0
Financial
0
Graphs
0
Hyperspectral images
0
Images
0
Interactive
0
LiDAR
0
Lyrics
0
MRI
0
Medical
0
Midi
0
Music
0
PSG
0
Parallel
0
Physics
0
Point cloud
0
RGB Video
0
RGB-D
0
Ranking
0
Replay data
0
Speech
0
Stereo
0
Tables
0
Tabular
0
Time series
0
Tracking
0
Videos
0
fMRI
0
Filter by Task
(clear)
Hungarian Text Diacritization
×
Machine Translation
4
Named Entity Recognition (NER)
4
Referring Expression Segmentation
4
Video Retrieval
4
Discourse Parsing
3
Discourse Segmentation
3
Part-Of-Speech Tagging
3
Question Answering
3
Semantic Segmentation
3
Text Segmentation
3
2D Object Detection
2
Classification
2
Dependency Parsing
2
Entity Linking
2
Few-Shot Image Classification
2
Image Captioning
2
Image Generation
2
Image-to-Text Retrieval
2
Multi-Task Learning
2
Relation Classification
2
Sentiment Analysis
2
Speech Recognition
2
Speech Separation
2
Text Classification
2
Video Captioning
2
2D Semantic Segmentation
1
3D Face Animation
1
3D Shape Reconstruction
1
Action Anticipation
1
Action Recognition
1
Arithmetic Reasoning
1
Audio Classification
1
Audio to Text Retrieval
1
Author Attribution
1
Bridging Anaphora Resolution
1
COVID-19 Diagnosis
1
Cell Segmentation
1
Cell Tracking
1
Chatbot
1
Chinese Word Segmentation
1
Clinical Section Identification
1
Computed Tomography (CT)
1
Connective Detection
1
Constituency Parsing
1
Coreference Resolution
1
Croatian Text Diacritization
1
Czech Text Diacritization
1
Data-to-Text Generation
1
Entity Typing
1
Face Sketch Synthesis
1
Few-Shot Learning
1
Few-Shot Semantic Segmentation
1
French Text Diacritization
1
Generalized Referring Expression Comprehension
1
Generalized Referring Expression Segmentation
1
Graph-to-Sequence
1
Handwritten Text Recognition
1
Headline Generation
1
Image Classification
1
Image Retrieval
1
Image Segmentation
1
Image-text Classification
1
Implicit Discourse Relation Classification
1
Information Retrieval
1
Instance Segmentation
1
Irish Text Diacritization
1
Joint Entity and Relation Extraction
1
KG-to-Text Generation
1
Key Information Extraction
1
Key Point Matching
1
Latvian Text Diacritization
1
Lemmatization
1
Lip Reading
1
Lip to Speech Synthesis
1
Lipreading
1
Logical Reasoning
1
Meeting Summarization
1
Moment Retrieval
1
Morphological Tagging
1
Multi-Instance Retrieval
1
Multi-hop Question Answering
1
Multimodal Reasoning
1
Multimodal Sentiment Analysis
1
Multiple Instance Learning
1
Music Source Separation
1
Natural Language Inference
1
Natural Language Moment Retrieval
1
Nested Mention Recognition
1
Nested Named Entity Recognition
1
Neural Architecture Search
1
Object Counting
1
Object Detection
1
Object Recognition
1
Open Vocabulary Action Recognition
1
Persuasion Strategies
1
Polish Text Diacritization
1
Pose Estimation
1
Reading Comprehension
1
Referring Expression Comprehension
1
Referring Video Object Segmentation
1
Region Proposal
1
Relation Extraction
1
Robust Speech Recognition
1
Romanian Text Diacritization
1
SVBRDF Estimation
1
Scientific Results Extraction
1
Semi-Supervised Video Object Segmentation
1
Sentence segmentation
1
Sentiment Classification
1
Sequential sentence segmentation
1
Slot Filling
1
Slovak Text Diacritization
1
Spanish Text Diacritization
1
Speaker Diarization
1
Speaker Identification
1
Speaker Verification
1
Speech Emotion Recognition
1
Speech Synthesis
1
Table-to-Text Generation
1
Talking Face Generation
1
Talking Head Generation
1
Temporal Action Localization
1
Text Generation
1
Text Summarization
1
Text to Audio Retrieval
1
Text-to-Image Generation
1
Timex normalization
1
Topic Classification
1
Turkish Text Diacritization
1
Unconstrained Lip-synchronization
1
Unsupervised Domain Adaptation
1
Unsupervised KG-to-Text Generation
1
Unsupervised Video Object Segmentation
1
Unsupervised semantic parsing
1
Video Object Segmentation
1
Video Question Answering
1
Video Summarization
1
Vietnamese Text Diacritization
1
Visual Grounding
1
Visual Reasoning
1
Visual Relationship Detection
1
Visual Speech Recognition
1
Zero-Shot Image Classification
1
Zero-Shot Learning
1
Zero-Shot Machine Translation
1
Zero-Shot Video Retrieval
1
Zero-shot Text-to-Image Retrieval
1
audio-visual learning
1
multimodal generation
1
regression
1
Filter by Language
Croatian
1
Czech
1
French
1
Hungarian
1
Irish
1
Latvian
1
Polish
1
Romanian
1
Slovak
1
Spanish
1
Turkish
1
Vietnamese
1
Abkhazian
0
Achinese
0
Adyghe
0
Afar
0
Afrikaans
0
Akan
0
Akkadian
0
Akuntsu
0
Albanian
0
American Sign Language
0
Amharic
0
Ancient Greek
0
Ancient Hebrew
0
Apurinã
0
Arabic
0
Aragonese
0
Argentine Sign Language
0
Armenian
0
Arpitan
0
Assamese
0
Assyrian Neo-Aramaic
0
Asturian
0
Avaric
0
Aymara
0
Azerbaijani
0
Bambara
0
Bangala
0
Bangladeshi Sign Language
0
Banjar
0
Bashkir
0
Basque
0
Bavarian
0
Belarusian
0
Bemba (Zambia)
0
Bengali
0
Bhojpuri
0
Bishnupriya
0
Bislama
0
Bodo (India)
0
Bosnian
0
Breton
0
Buginese
0
Bulgarian
0
Burmese
0
Catalan
0
Cebuano
0
Central Bikol
0
Central Khmer
0
Central Kurdish
0
Central Pashto
0
Chamorro
0
Chavacano
0
Chechen
0
Cherokee
0
Cheyenne
0
Chinese
0
Choctaw
0
Chukot
0
Church Slavic
0
Chuvash
0
Congo Swahili
0
Coptic
0
Cornish
0
Corsican
0
Cree
0
Creek
0
Crimean Tatar
0
Danish
0
Dhivehi
0
Dimli (individual language)
0
Dogri (individual language)
0
Dogri (macrolanguage)
0
Dutch
0
Dzongkha
0
Eastern Mari
0
Egyptian Arabic
0
English
0
Erzya
0
Esperanto
0
Estonian
0
Ewe
0
Extremaduran
0
Faroese
0
Fiji Hindi
0
Fijian
0
Filipino
0
Finnish
0
Fon
0
Friulian
0
Fulah
0
Gagauz
0
Galician
0
Gan Chinese
0
Ganda
0
Geez
0
Georgian
0
German
0
German Sign Language
0
Gilaki
0
Goan Konkani
0
Gothic
0
Greek
0
Greek Sign Language
0
Guarani
0
Gujarati
0
Gulf Arabic
0
Haitian
0
Hakha Chin
0
Hakka Chinese
0
Halh Mongolian
0
Hausa
0
Hawaiian
0
Hebrew
0
Herero
0
Hindi
0
Hiri Motu
0
Icelandic
0
Ido
0
Igbo
0
Iloko
0
Indonesian
0
Interlingua (International Auxiliary Language Association)
0
Interlingue
0
Inuktitut
0
Inupiaq
0
Iranian Persian
0
Italian
0
Jamaican Creole English
0
Japanese
0
Javanese
0
Jejueo
0
Kabardian
0
Kabuverdianu
0
Kabyle
0
Kachin
0
Kalaallisut
0
Kalmyk
0
Kannada
0
Kanuri
0
Kara-Kalpak
0
Karachay-Balkar
0
Karelian
0
Kashmiri
0
Kashubian
0
Kazakh
0
Khunsari
0
Kikuyu
0
Kinyarwanda
0
Kirghiz
0
Komi
0
Komi-Permyak
0
Komi-Zyrian
0
Kongo
0
Korean
0
Krio
0
Kuanyama
0
Kurdish
0
Kölsch
0
Ladino
0
Lak
0
Lao
0
Latgalian
0
Latin
0
Lezghian
0
Ligurian
0
Limburgan
0
Lingala
0
Literary Chinese
0
Lithuanian
0
Livvi
0
Lojban
0
Lombard
0
Low German
0
Lower Sorbian
0
Lozi
0
Lunda
0
Luo (Cameroon)
0
Luo (Kenya and Tanzania)
0
Lushai
0
Luxembourgish
0
Macedonian
0
Maithili
0
Malagasy
0
Malay (individual language)
0
Malay (macrolanguage)
0
Malayalam
0
Maltese
0
Mandarin Chinese
0
Manipuri
0
Manx
0
Maori
0
Marathi
0
Marshallese
0
Mazanderani
0
Mbyá Guaraní
0
Mesopotamian Arabic
0
Min Dong Chinese
0
Minangkabau
0
Mingrelian
0
Mirandese
0
Modern Greek
0
Modern Greek (1453-)
0
Moksha
0
Mongolian
0
Moroccan Arabic
0
Multilingual
0
Mundurukú
0
Najdi Arabic
0
Narom
0
Nauru
0
Navajo
0
Naxi
0
Nayini
0
Ndonga
0
Neapolitan
0
Nepali (individual language)
0
Nepali (macrolanguage)
0
Newari
0
Nigerian Fulfulde
0
Nigerian Pidgin
0
North Azerbaijani
0
North Levantine Arabic
0
Northern Frisian
0
Northern Huishui Hmong
0
Northern Kurdish
0
Northern Luri
0
Northern Sami
0
Northern Uzbek
0
Norwegian
0
Norwegian Bokmål
0
Norwegian Nynorsk
0
Novial
0
Nyanja
0
Occitan (post 1500)
0
Odia
0
Official Aramaic (700-300 BCE)
0
Old English (ca. 450-1100)
0
Old French
0
Old Russian
0
Old Turkish
0
Oriya (macrolanguage)
0
Oromo
0
Ossetian
0
Pali
0
Pampanga
0
Pangasinan
0
Papiamento
0
Pedi
0
Pennsylvania German
0
Persian
0
Pfaelzisch
0
Picard
0
Piemontese
0
Pitcairn-Norfolk
0
Plateau Malagasy
0
Pontic
0
Portuguese
0
Portuguse
0
Punjabi
0
Pushto
0
Quechua
0
Rajasthani
0
Romansh
0
Rundi
0
Russia Buriat
0
Russian
0
Rusyn
0
Saidi Arabic
0
Samoan
0
Sango
0
Sanskrit
0
Santali
0
Sardinian
0
Saterfriesisch
0
Scots
0
Scottish Gaelic
0
Serbian
0
Serbo-Croatian
0
Shan
0
Shona
0
Sichuan Yi
0
Sicilian
0
Silesian
0
Sindhi
0
Sinhala
0
Skolt Sami
0
Slovenian
0
Soi
0
Somali
0
South Azerbaijani
0
South Levantine Arabic
0
Southern Pashto
0
Southern Sotho
0
Sranan Tongo
0
Standard Arabic
0
Standard Latvian
0
Sundanese
0
Swahili
0
Swahili (macrolanguage)
0
Swati
0
Swedish
0
Swedish Sign Language
0
Swiss German
0
Swiss-German Sign Language
0
Tagalog
0
Tahitian
0
Tai
0
Tajik
0
Tamil
0
Tatar
0
Telugu
0
Tetum
0
Thai
0
Tibetan
0
Tigrinya
0
Tok Pisin
0
Tonga (Tonga Islands)
0
Tonga (Zambia)
0
Tosk Albanian
0
Tsonga
0
Tswana
0
Tulu
0
Tumbuka
0
Tunisian Arabic
0
Tupinambá
0
Turkish Sign Language
0
Turkmen
0
Tuvinian
0
Twi
0
Udmurt
0
Uighur
0
Ukrainian
0
Upper Sorbian
0
Urdu
0
Uzbek
0
Venda
0
Venetian
0
Veps
0
Vlaams
0
Vlax Romani
0
Volapük
0
Votic
0
Walloon
0
Waray (Philippines)
0
Warlpiri
0
Welsh
0
West Central Oromo
0
Western Frisian
0
Western Mari
0
Western Panjabi
0
Wolof
0
Wu Chinese
0
Xhosa
0
Yakut
0
Yiddish
0
Yoruba
0
Yue Chinese
0
Zaza
0
Zeeuws
0
Zhuang
0
Zulu
0
1 dataset result for
segmentation
AND
Hungarian Text Diacritization
AND
Texts
Multilingual Dataset for Training and Evaluating Diacritics Restoration Systems
Multilingual Dataset for Training and Evaluating Diacritics Restoration Systems
…Data are
segmented
into sentences which are further word tokenized.
2
PAPERS
• 12
BENCHMARKS