CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI) is the largest dataset of sentence level sentiment analysis and emotion recognition in online videos. CMU-MOSEI contains more than 65 hours of annotated video from more than 1000 speakers and 250 topics.
154 PAPERS • 2 BENCHMARKS
ROCStories is a collection of commonsense short stories. The corpus consists of 100,000 five-sentence stories. Each story logically follows everyday topics created by Amazon Mechanical Turk workers. These stories contain a variety of commonsense causal and temporal relations between everyday events. Writers also develop an additional 3,742 Story Cloze Test stories which contain a four-sentence-long body and two candidate endings. The endings were collected by asking Mechanical Turk workers to write both a right ending and a wrong ending after eliminating original endings of given short stories. Both endings were required to make logical sense and include at least one character from the main story line. The published ROCStories dataset is constructed with ROCStories as a training set that includes 98,162 stories that exclude candidate wrong endings, an evaluation set, and a test set, which have the same structure (1 body + 2 candidate endings) and a size of 1,871.
139 PAPERS • 2 BENCHMARKS
GoEmotions is a corpus of 58k carefully curated comments extracted from Reddit, with human annotations to 27 emotion categories or Neutral.
102 PAPERS • 2 BENCHMARKS
Over a period of many years during the 1990s, a large group of psychologists all over the world collected data in the ISEAR project, directed by Klaus R. Scherer and Harald Wallbott. Student respondents, both psychologists and non-psychologists, were asked to report situations in which they had experienced all of 7 major emotions (joy, fear, anger, sadness, disgust, shame, and guilt). In each case, the questions covered the way they had appraised the situation and how they reacted. The final data set thus contained reports on seven emotions each by close to 3000 respondents in 37 countries on all 5 continents.
49 PAPERS • NO BENCHMARKS YET
EmotionLines contains a total of 29245 labeled utterances from 2000 dialogues. Each utterance in dialogues is labeled with one of seven emotions, six Ekman’s basic emotions plus the neutral emotion. Each labeling was accomplished by 5 workers, and for each utterance in a label, the emotion category with the highest votes was set as the label of the utterance. Those utterances voted as more than two different emotions were put into the non-neutral category. Therefore the dataset has a total of 8 types of emotion labels, anger, disgust, fear, happiness, sadness, surprise, neutral, and non-neutral.
42 PAPERS • 1 BENCHMARK
ArmanEmo is a human-labeled emotion dataset of more than 7000 Persian sentences labeled for seven categories. The dataset has been collected from different resources, including Twitter, Instagram, and Digikala (an Iranian e-commerce company) comments. Labels are based on Ekman's six basic emotions (Anger, Fear, Happiness, Hatred, Sadness, Wonder) and another category (Other) to consider any other emotion not included in Ekman's model.
6 PAPERS • 1 BENCHMARK
EmoPars is a dataset of 30,000 Persian Tweets labeled with Ekman’s six basic emotions (Anger, Fear, Happiness, Sadness, Hatred, and Wonder). This is the first publicly available emotion dataset in the Persian language.
2 PAPERS • NO BENCHMARKS YET
ReactionGIF is an affective dataset of 30K tweets which can be used for tasks like induced sentiment prediction and multilabel classification of induced emotions.
This repository contains a financial-domain-focused dataset for financial sentiment/emotion classification and stock market time series prediction. It's based on our paper: StockEmotions: Discover Investor Emotions for Financial Sentiment Analysis and Multivariate Time Series accepted by AAAI 2023 Bridge (AI for Financial Services).
BanglaEmotion is a manually annotated Bangla Emotion corpus, which incorporates the diversity of fine-grained emotion expressions in social-media text. More fine-grained emotion labels are considered such as Sadness, Happiness, Disgust, Surprise, Fear and Anger - which are, according to Paul Ekman (1999), the six basic emotion categories. For this task, a large amount of raw text data are collected from the user’s comments on two different Facebook groups (Ekattor TV and Airport Magistrates) and from the public post of a popular blogger and activist Dr. Imran H Sarker. These comments are mostly reactions to ongoing socio-political issues and towards the economic success and failure of Bangladesh. A total of 32923 comments are scraped from the three sources aforementioned above. Out of these, a total of 6314 comments were annotated into the six categories. The distribution of the annotated corpus is as follows:
1 PAPER • NO BENCHMARKS YET
Reader Emotion News 20k Dataset
The Sequence labellIng evaLuatIon benChmark fOr spoken laNguagE (SILICONE) benchmark is a collection of resources for training, evaluating, and analyzing natural language understanding systems specifically designed for spoken language. All datasets are in the English language and covers a large variety of domains (e.g daily life, scripted scenarios, joint task completion, phone call conversations, and televsion dialogue). Some datasets additionally include emotion and/or sentiment labels.
1 PAPER • 1 BENCHMARK
ShortPersianEmo is a new data set for emotion recognition in Persian short texts. The ShortPersianEmo dataset is a single-label dataset that contains 5472 short Persian texts collected from Twitter and Digikala. Our dataset is annotated according to Rachael Jack’s emotional model in five emotional classes happiness, sadness, anger, fear, and other. Unlike publicly accessible datasets that do not impose any restrictions on text length, ShortPersianEmo specifically focuses on short texts. The average text length in the ShortPersianEmo dataset is 56 words. Table 1 presents a comparison between the introduced ShortPersianEmo dataset and other datasets from the literature for emotion detection in Persian text. For more information on this dataset please read our paper. If you use this dataset in any research work, please cite our paper.