This is a dataset for detection fake death hoaxes. It consists of of death reports collected from Twitter between 1st January, 2012 and 31st December, 2014. It was collected by tracking the keyword 'RIP', and matching those tweets in which a name is mentioned next to RIP. Matching names were identified by using Wikidata as a database of names.
1 PAPER • NO BENCHMARKS YET
Urban Dict spelling variant is a variant spelling dataset for use of NLP research in the informal domain. It consists of around 25k variant spelling pairs form UrbanDictionary.
iLur News Texts is a dataset of over 12000 news articles from iLur.am, categorized into 7 classes: sport, politics, weather, economy, accidents, art, society. The articles are split into train (2242k tokens) and test sets (425k tokens).
We provide a Mikolov-style word-analogy evaluation set specifically for Bangla, with a sample size of 16678, as well as a translated and curated version of the Mikolov dataset, which contains 10594 samples for cross-lingual research.
0 PAPER • NO BENCHMARKS YET