52 dataset results for Word Embeddings

This is a dataset for detection fake death hoaxes. It consists of of death reports collected from Twitter between 1st January, 2012 and 31st December, 2014. It was collected by tracking the keyword 'RIP', and matching those tweets in which a name is mentioned next to RIP. Matching names were identified by using Wikidata as a database of names.

1 PAPER • NO BENCHMARKS YET

Urban Dict spelling variant

Urban Dict spelling variant is a variant spelling dataset for use of NLP research in the informal domain. It consists of around 25k variant spelling pairs form UrbanDictionary.

1 PAPER • NO BENCHMARKS YET

iLur News Texts

iLur News Texts is a dataset of over 12000 news articles from iLur.am, categorized into 7 classes: sport, politics, weather, economy, accidents, art, society. The articles are split into train (2242k tokens) and test sets (425k tokens).

1 PAPER • NO BENCHMARKS YET

Word Analogy Bangla

We provide a Mikolov-style word-analogy evaluation set specifically for Bangla, with a sample size of 16678, as well as a translated and curated version of the Mikolov dataset, which contains 10594 samples for cross-lingual research.

0 PAPER • NO BENCHMARKS YET

Datasets

52 dataset results for Word Embeddings