Contains 350 tweets with more than 8,000 words including 3,000 unique words written in Egyptian dialect. The tweets have much dialectal content covering most of dialectal Egyptian phonological, morphological, and syntactic phenomena. It also includes Twitter-specific aspects of the text, such as #hashtags, @mentions, emoticons and URLs.
1 PAPER • NO BENCHMARKS YET
Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts in 45 languages, generated by UDPipe (http://ufal.mff.cuni.cz/udpipe), together with word embeddings of dimension
…It includes tasks like: word segmentation, part of speech tagging, reading comprehension and document retrieval.
4 PAPERS • NO BENCHMARKS YET
…Annotations include: Multiple POS tags, morphological features and lemmatization Sentence segmentation and rough speech act Document structure in TEI XML (paragraphs, headings, figures, etc.)
8 PAPERS • 1 BENCHMARK