🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task (clear)

Filter by Language

91 dataset results for Sentiment Analysis

The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews. It was parsed with the Stanford parser and includes a total of 215,154 unique phrases from those parse trees, each annotated by 3 human judges.

2,004 PAPERS • 13 BENCHMARKS

IMDb Movie Reviews

The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. No more than 30 reviews are included per movie. The dataset contains additional unlabeled data.

1,566 PAPERS • 11 BENCHMARKS

SST-2

1,491 PAPERS • 5 BENCHMARKS

MPQA Opinion Corpus (Multi-Perspective Question Answering)

The MPQA Opinion Corpus contains 535 news articles from a wide variety of news sources manually annotated for opinions and other private states (i.e., beliefs, emotions, sentiments, speculations, etc.).

300 PAPERS • 3 BENCHMARKS

SST-5

The SST-5, also known as the Stanford Sentiment Treebank with 5 labels, is a dataset used for sentiment analysis. The SST-5 dataset consists of 11,855 single sentences extracted from movie reviews¹. It includes a total of 215,154 unique phrases from parse trees, each annotated by 3 human judges¹. Each phrase is labeled as either negative, somewhat negative, neutral, somewhat positive, or positive. This is why it's referred to as SST-5 or SST fine-grained.

285 PAPERS • 3 BENCHMARKS

ReDial

ReDial (Recommendation Dialogues) is an annotated dataset of dialogues, where users recommend movies to each other. The dataset consists of over 10,000 conversations centered around the theme of providing movie recommendations.

86 PAPERS • 2 BENCHMARKS

TweetEval

TweetEval introduces an evaluation framework consisting of seven heterogeneous Twitter-specific classification tasks.

71 PAPERS • 2 BENCHMARKS

Yelp

The Yelp Dataset is a valuable resource for academic research, teaching, and learning. It provides a rich collection of real-world data related to businesses, reviews, and user interactions. Here are the key details about the Yelp Dataset: Reviews: A whopping 6,990,280 reviews from users. Businesses: Information on 150,346 businesses. Pictures: A collection of 200,100 pictures. Metropolitan Areas: Data from 11 metropolitan areas. Tips: Over 908,915 tips provided by 1,987,897 users. Business Attributes: Details like hours, parking availability, and ambiance for more than 1.2 million businesses. Aggregated Check-ins: Historical check-in data for each of the 131,930 businesses.

68 PAPERS • 23 BENCHMARKS

Multi-Domain Sentiment

The Multi-Domain Sentiment Dataset contains product reviews taken from Amazon.com from many product types (domains). Some domains (books and dvds) have hundreds of thousands of reviews. Others (musical instruments) have only a few hundred. Reviews contain star ratings (1 to 5 stars) that can be converted into binary labels if needed.

53 PAPERS • 1 BENCHMARK

ASTD

ASTD (Arabic Sentiment Tweets Dataset)

Arabic Sentiment Tweets Dataset (ASTD) is an Arabic social sentiment analysis dataset gathered from Twitter. It consists of about 10,000 tweets which are classified as objective, subjective positive, subjective negative, and subjective mixed.

30 PAPERS • 1 BENCHMARK

SemEval-2013 Task-2

The SemEval-2013 Task 2 dataset contains data for two subtasks: A, an expression-level subtask, and B, a message-level subtask. Crowdsourcing was used to label a large Twitter training dataset along with additional test sets of Twitter and SMS messages for both subtasks.

30 PAPERS • NO BENCHMARKS YET

MR (MR Movie Reviews)

MR Movie Reviews is a dataset for use in sentiment-analysis experiments. Available are collections of movie-review documents labeled with respect to their overall sentiment polarity (positive or negative) or subjective rating (e.g., "two and a half stars") and sentences labeled with respect to their subjectivity status (subjective or objective) or polarity.

27 PAPERS • 4 BENCHMARKS

SentiMix

Sentiment analysis of codemixed tweets.

27 PAPERS • NO BENCHMARKS YET

SLUE

SLUE (Spoken Language Understanding Evaluation)

Spoken Language Understanding Evaluation (SLUE) is a suite of benchmark tasks for spoken language understanding evaluation. It consists of limited-size labeled training sets and corresponding evaluation sets. This resource would allow the research community to track progress, evaluate pre-trained representations for higher-level tasks, and study open questions such as the utility of pipeline versus end-to-end approaches. The first phase of the SLUE benchmark suite consists of named entity recognition (NER), sentiment analysis (SA), and ASR on the corresponding datasets.

19 PAPERS • 3 BENCHMARKS

iSarcasm

iSarcasm is a dataset of tweets, each labelled as either sarcastic or non_sarcastic. Each sarcastic tweet is further labelled for one of the following types of ironic speech:

17 PAPERS • 1 BENCHMARK

LABR

LABR (Large-Scale Arabic Book Reviews)

LABR is a large sentiment analysis dataset to-date for the Arabic language. It consists of over 63,000 book reviews, each rated on a scale of 1 to 5 stars.

16 PAPERS • 1 BENCHMARK

ArSarcasm-v2

ArSarcasm-v2 is an extension of the original ArSarcasm dataset published along with the paper From Arabic Sentiment Analysis to Sarcasm Detection: The ArSarcasm Dataset. ArSarcasm-v2 conisists of ArSarcasm along with portions of DAICT corpus and some new tweets. Each tweet was annotated for sarcasm, sentiment and dialect. The final dataset consists of 15,548 tweets divided into 12,548 training tweets and 3,000 testing tweets. ArSarcasm-v2 was used and released as a part of the shared task on sarcasm detection and sentiment analysis in Arabic.

14 PAPERS • NO BENCHMARKS YET

DAiSEE

DAiSEE is a multi-label video classification dataset comprising of 9,068 video snippets captured from 112 users for recognizing the user affective states of boredom, confusion, engagement, and frustration "in the wild". The dataset has four levels of labels namely - very low, low, high, and very high for each of the affective states, which are crowd annotated and correlated with a gold standard annotation created using a team of expert psychologists.

14 PAPERS • NO BENCHMARKS YET

DynaSent

DynaSent is an English-language benchmark task for ternary (positive/negative/neutral) sentiment analysis. DynaSent combines naturally occurring sentences with sentences created using the open-source Dynabench Platform, which facilities human-and-model-in-the-loop dataset creation. DynaSent has a total of 121,634 sentences, each validated by five crowdworkers.

14 PAPERS • 1 BENCHMARK

IndicGLUE

IndicGLUE (Indic General Language Understanding Evaluation Benchmark)

We now introduce IndicGLUE, the Indic General Language Understanding Evaluation Benchmark, which is a collection of various NLP tasks as de- scribed below. The goal is to provide an evaluation benchmark for natural language understanding ca- pabilities of NLP models on diverse tasks and mul- tiple Indian languages.

14 PAPERS • 3 BENCHMARKS

CH-SIMS

CH-SIMS is a Chinese single- and multimodal sentiment analysis dataset which contains 2,281 refined video segments in the wild with both multimodal and independent unimodal annotations. It allows researchers to study the interaction between modalities or use independent unimodal annotations for unimodal sentiment analysis.

13 PAPERS • 1 BENCHMARK

PANDORA

PANDORA is the first large-scale dataset of Reddit comments labeled with three personality models (including the well-established Big 5 model) and demographics (age, gender, and location) for more than 10k users.

13 PAPERS • NO BENCHMARKS YET

ArSarcasm

ArSarcasm is a new Arabic sarcasm detection dataset. The dataset was created using previously available Arabic sentiment analysis datasets (SemEval 2017 and ASTD) and adds sarcasm and dialect labels to them. The dataset contains 10,547 tweets, 1,682 (16%) of which are sarcastic.

12 PAPERS • NO BENCHMARKS YET

ASC (TIL, 19 tasks)

ASC (TIL, 19 tasks) (Task Incremental Aspect Sentiment Classification)

A set of 19 ASC datasets (reviews of 19 products) producing a sequence of 19 tasks. Each dataset represents a task. The datasets are from 4 sources: (1) HL5Domains (Hu and Liu, 2004) with reviews of 5 products; (2) Liu3Domains (Liu et al., 2015) with reviews of 3 products; (3) Ding9Domains (Ding et al., 2008) with reviews of 9 products; and (4) SemEval14 with reviews of 2 products - SemEval 2014 Task 4 for laptop and restaurant. For (1), (2) and (3), we split about 10% of the original data as the validate data, another about 10% of the original data as the testing data. For (4), We use 150 examples from the training set for validation. To be consistent with existing research(Tang et al., 2016), examples belonging to the conflicting polarity (both positive and negative sentiments are expressed about an aspect term) are not used. Statistics and details of the 19 datasets are given on Page https://github.com/ZixuanKe/PyContinual.

11 PAPERS • 1 BENCHMARK

NoReC

NoReC (Norwegian Review Corpus)

The Norwegian Review Corpus (NoReC) was created for the purpose of training and evaluating models for document-level sentiment analysis. More than 43,000 full-text reviews have been collected from major Norwegian news sources and cover a range of different domains, including literature, movies, video games, restaurants, music and theater, in addition to product reviews across a range of categories. Each review is labeled with a manually assigned score of 1–6, as provided by the rating of the original author.

11 PAPERS • NO BENCHMARKS YET

SentNoB

SentNoB (SentNoB: A Dataset for Analysing Sentiment on Noisy Bangla Texts)

Social Media User Sentiment Analysis Dataset. Each user comments are labeled with either positive (1), negative (2), or neutral (0).

11 PAPERS • NO BENCHMARKS YET

TaPaCo

TaPaCo is a freely available paraphrase corpus for 73 languages extracted from the Tatoeba database.

11 PAPERS • 1 BENCHMARK

HappyDB

HappyDB is a corpus of 100,000 crowdsourced happy moments.

9 PAPERS • NO BENCHMARKS YET

TSAC

TSAC (Tunisian Sentiment Analysis Corpus)

Tunisian Sentiment Analysis Corpus (TSAC) is a Tunisian Dialect corpus of 17.000 comments from Facebook.

9 PAPERS • NO BENCHMARKS YET

ArSentD-LEV

The Arabic Sentiment Twitter Dataset for the Levantine dialect (ArSenTD-LEV) is a dataset of 4,000 tweets with the following annotations: the overall sentiment of the tweet, the target to which the sentiment was expressed, how the sentiment was expressed, and the topic of the tweet.

8 PAPERS • NO BENCHMARKS YET

MultiBooked

MultiBooked is a dataset for supervised aspect-level sentiment analysis in Basque and Catalan, both of which are under-resourced languages.

8 PAPERS • NO BENCHMARKS YET

PHINC

PHINC is a parallel corpus of the 13,738 code-mixed English-Hindi sentences and their corresponding translation in English. The translations of sentences are done manually by the annotators.

8 PAPERS • NO BENCHMARKS YET

Perspectrum

Perspectrum is a dataset of claims, perspectives and evidence, making use of online debate websites to create the initial data collection, and augmenting it using search engines in order to expand and diversify the dataset. Crowd-sourcing was used to filter out noise and ensure high-quality data. The dataset contains 1k claims, accompanied with pools of 10k and 8k perspective sentences and evidence paragraphs, respectively.

8 PAPERS • 1 BENCHMARK

JGLUE

JGLUE, Japanese General Language Understanding Evaluation, is built to measure the general NLU ability in Japanese.

7 PAPERS • NO BENCHMARKS YET

L3CubeMahaSent

L3CubeMahaSent is a large publicly available Marathi Sentiment Analysis dataset. It consists of marathi tweets which are manually labelled.

7 PAPERS • NO BENCHMARKS YET

SubjQA

SubjQA is a question answering dataset that focuses on subjective (as opposed to factual) questions and answers. The dataset consists of roughly 10,000 questions over reviews from 6 different domains: books, movies, grocery, electronics, TripAdvisor (i.e. hotels), and restaurants. Each question is paired with a review and a span is highlighted as the answer to the question (with some questions having no answer). Moreover, both questions and answer spans are assigned a subjectivity label by annotators. Questions such as "How much does this product weigh?" is a factual question (i.e., low subjectivity), while "Is this easy to use?" is a subjective question (i.e., high subjectivity).

7 PAPERS • 1 BENCHMARK

TUNIZI

A sentiment analysis Tunisian Arabizi Dataset, collected from social networks, preprocessed for analytical studies and annotated manually by Tunisian native speakers.

7 PAPERS • NO BENCHMARKS YET

DSC (10 tasks)

DSC (10 tasks) (Task Incremental Document Sentiment Classification)

A set of 10 DSC datasets (reviews of 10 products) to produce sequences of tasks. The products are Sports, Toys, Tools, Video, Pet, Musical, Movies, Garden, Offices, and Kindle. 2500 positive and 2500 negative training reviews per task . The validation reviews are with 250 positive and 250 negative and the test reviews are with 250 positive and 250 negative reviews. The detailed statistic on page https://github.com/ZixuanKe/PyContinual

6 PAPERS • 1 BENCHMARK

SemEval-2020 Task-8

A multimodal dataset for sentiment analysis on internet memes.

6 PAPERS • NO BENCHMARKS YET

RuSentRel

RuSentRel is a corpus of analytical articles translated into Russian texts in the domain of international politics obtained from foreign authoritative sources. The collected articles contain both the author's opinion on the subject matter of the article and a large number of references mentioned between the participants of the described situations. In total, 73 large analytical texts were labeled with about 2000 relations.

5 PAPERS • NO BENCHMARKS YET

Laptop-ACOS

Laptop-ACOS is a brand new Laptop dataset collected from the Amazon platform in the years 2017 and 2018 (covering ten types of laptops under six brands such as ASUS, Acer, Samsung, Lenovo, MBP, MSI, and so on). It contains 4,076 review sentences, much larger than the SemEval Laptop datasets. For Laptop-ACOS, we annotate the four elements and their corresponding quadruples all by ourselves. We employ the aspect categories defined in the SemEval 2016 Laptop dataset. The Laptop-ACOS dataset contains 4076 sentences with 5758 quadruples. As we have mentioned, a large percentage of the quadruples contain implicit aspects or implicit opinions . By comparing two datasets, it can be observed that Laptop-ACOS has a higher percentage of implicit opinions than Restaurant-ACOS . It is worth noting that the Laptop-ACOS is available for all subtasks in ABSA, including aspect-based sentiment classification, aspect-sentiment pair extraction, aspect-opinion pair extraction, aspect-opinion sentiment tri

4 PAPERS • 1 BENCHMARK

MFRC (Moral Foundations Reddit Corpus)

Moral Foundations Reddit Corpus (MFRC) is a collection of 16,123 Reddit comments that have been curated from 12 distinct subreddits, hand-annotated by at least three trained annotators for 8 categories of moral sentiment (i.e., Care, Proportionality, Equality, Purity, Authority, Loyalty, Thin Morality, Implicit/Explicit Morality) based on the updated Moral Foundations Theory (MFT) framework.

4 PAPERS • NO BENCHMARKS YET

Restaurant-ACOS

The Restaurant-ACOS dataset is constructed based on the SemEval 2016 Restaurant dataset (Pontiki et al., 2016) and its expansion datasets (Fan et al., 2019; Xu et al., 2020). The SemEval 2016 Restaurant dataset (Pontiki et al., 2016) was annotated with explicit and implicit aspects, categories, and sentiment. (Fan et al., 2019; Xu et al., 2020) further added the opinion annotations. We integrate their annotations to construct aspect-category-opinion-sentiment quadruples and further annotate the implicit opinions. The Restaurant-ACOS dataset contains 2286 sentences with 3658 quadruples. It is worth noting that the Restaurant-ACOS is available for all subtasks in ABSA, including aspect-based sentiment classification, aspect-sentiment pair extraction, aspect-opinion pair extraction, aspect-opinion sentiment triple extraction, aspect-category-sentiment triple extraction, etc.

4 PAPERS • 1 BENCHMARK

WikiSem500

The WikiSem500 dataset contains around 500 per-language cluster groups for English, Spanish, German, Chinese, and Japanese (a total of 13,314 test cases).

4 PAPERS • NO BENCHMARKS YET

Amazon Fine Foods

Amazon Fine Foods is a dataset that consists of reviews of fine foods from amazon. The data span a period of more than 10 years, including all ~500,000 reviews up to October 2012. Reviews include product and user information, ratings, and a plaintext review.

3 PAPERS • NO BENCHMARKS YET

CAIL2019-SCM

Chinese AI and Law 2019 Similar Case Matching dataset. CAIL2019-SCM contains 8,964 triplets of cases published by the Supreme People's Court of China. CAIL2019-SCM focuses on detecting similar cases, and the participants are required to check which two cases are more similar in the triplets.

3 PAPERS • 2 BENCHMARKS

GeoCoV19

GeoCoV19 is a large-scale Twitter dataset containing more than 524 million multilingual tweets. The dataset contains around 378K geotagged tweets and 5.4 million tweets with Place information. The annotations include toponyms from the user location field and tweet content and resolve them to geolocations such as country, state, or city level. In this case, 297 million tweets are annotated with geolocation using the user location field and 452 million tweets using tweet content.

3 PAPERS • NO BENCHMARKS YET

Pars-ABSA

Pars-ABSA is a manually annotated Persian dataset, Pars-ABSA, which is verified by 3 native Persian speakers. The dataset consists of 5,114 positive, 3,061 negative and 1,827 neutral data samples from 5,602 unique reviews.

3 PAPERS • NO BENCHMARKS YET

Datasets

91 dataset results for Sentiment Analysis