The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. No more than 30 reviews are included per movie. The dataset contains additional unlabeled data.
1,586 PAPERS • 11 BENCHMARKS
The MPQA Opinion Corpus contains 535 news articles from a wide variety of news sources manually annotated for opinions and other private states (i.e., beliefs, emotions, sentiments, speculations, etc.).
301 PAPERS • 3 BENCHMARKS
For understanding multimodal language used in expressing humor.
34 PAPERS • NO BENCHMARKS YET
The Norwegian Review Corpus (NoReC) was created for the purpose of training and evaluating models for document-level sentiment analysis. More than 43,000 full-text reviews have been collected from major Norwegian news sources and cover a range of different domains, including literature, movies, video games, restaurants, music and theater, in addition to product reviews across a range of categories. Each review is labeled with a manually assigned score of 1–6, as provided by the rating of the original author.
12 PAPERS • NO BENCHMARKS YET
Youtbean is a dataset created from closed captions of YouTube product review videos. It can be used for aspect extraction and sentiment classification.
3 PAPERS • NO BENCHMARKS YET
FinnSentiment introduces a 27,000 sentence dataset (in Finnish) annotated independently with sentiment polarity by three native annotators.
2 PAPERS • NO BENCHMARKS YET
Conversational Stance Detection (CSD) is a dataset with annotations of stances and the structures of conversation threads. It consists of 500 conversation threads (including 500 posts and 5376 comments) from six major social media platforms in Hong Kong.
1 PAPER • NO BENCHMARKS YET