This is an entity-level Twitter Sentiment Analysis dataset. For each message, the task is to judge the sentiment of the entire sentence towards a given entity. For example, A outperforms B is positive for entity A but negative for entity B. The dataset contains ~70K labeled training messages and 1K labeled validation messages. It is available online for free on Kaggle.
4 PAPERS • 1 BENCHMARK
RETWEET is a dataset of tweets and overall predominant sentiment of their replies.
2 PAPERS • 1 BENCHMARK
#chinahate dataset contains a total of 2,172,333 tweets hashtagged #china posted during the time it was collected. It is designed for the task of hate speech detection.
1 PAPER • NO BENCHMARKS YET
The dataset contains 30 million cryptocurrency-related tweets from 10.10.2020 to 3.3.2021. See https://github.com/meakbiyik/ask-who-not-what for more details.
SentimentArcs’ reference corpus for novels consists of 25 narratives selected to create a diverse set of well recognized novels that can serve as a benchmark for future studies. The composition of the corpora was limited by the effect of copyright laws as well as historical imbalances. Most works were obtained from US and Australian Gutenberg Projects. The corpora is expected to grow in size and diversity over time.