5 dataset results for Morphological Analysis

CELEX database comprises three different searchable lexical databases, Dutch, English and German. The lexical data contained in each database is divided into five categories: orthography, phonology, morphology, syntax (word class) and word frequency.

57 PAPERS • NO BENCHMARKS YET

Polyglot-NER

Polyglot-NER builds massive multilingual annotators with minimal human expertise and intervention.

9 PAPERS • NO BENCHMARKS YET

Wikipedia Title

Wikipedia Title is a dataset for learning character-level compositionality from the character visual characteristics. It consists of a collection of Wikipedia titles in Chinese, Japanese or Korean labelled with the category to which the article belongs.

3 PAPERS • NO BENCHMARKS YET

Egyptian Arabic Segmentation Dataset

Contains 350 tweets with more than 8,000 words including 3,000 unique words written in Egyptian dialect. The tweets have much dialectal content covering most of dialectal Egyptian phonological, morphological, and syntactic phenomena. It also includes Twitter-specific aspects of the text, such as #hashtags, @mentions, emoticons and URLs.

1 PAPER • NO BENCHMARKS YET

TrMor2018

A new high accuracy Turkish morphology dataset.

1 PAPER • NO BENCHMARKS YET

Datasets

5 dataset results for Morphological Analysis