The Hateful Memes data set is a multimodal dataset for hateful meme detection (image + text) that contains 10,000+ new multimodal examples created by Facebook AI. Images were licensed from Getty Images so that researchers can use the data set to support their work.
144 PAPERS • 1 BENCHMARK
CARER is an emotion dataset collected through noisy labels, annotated via distant supervision as in (Go et al., 2009).
98 PAPERS • 4 BENCHMARKS
Over a period of many years during the 1990s, a large group of psychologists all over the world collected data in the ISEAR project, directed by Klaus R. Scherer and Harald Wallbott. Student respondents, both psychologists and non-psychologists, were asked to report situations in which they had experienced all of 7 major emotions (joy, fear, anger, sadness, disgust, shame, and guilt). In each case, the questions covered the way they had appraised the situation and how they reacted. The final data set thus contained reports on seven emotions each by close to 3000 respondents in 37 countries on all 5 continents.
49 PAPERS • NO BENCHMARKS YET
EmotionLines contains a total of 29245 labeled utterances from 2000 dialogues. Each utterance in dialogues is labeled with one of seven emotions, six Ekman’s basic emotions plus the neutral emotion. Each labeling was accomplished by 5 workers, and for each utterance in a label, the emotion category with the highest votes was set as the label of the utterance. Those utterances voted as more than two different emotions were put into the non-neutral category. Therefore the dataset has a total of 8 types of emotion labels, anger, disgust, fear, happiness, sadness, surprise, neutral, and non-neutral.
42 PAPERS • 1 BENCHMARK
We construct a dataset named CPED from 40 Chinese TV shows. CPED consists of multisource knowledge related to empathy and personal characteristic. This knowledge covers 13 emotions, gender, Big Five personality traits, 19 dialogue acts and other knowledge.
15 PAPERS • 3 BENCHMARKS
EmoWOZ is the first large-scale open-source dataset for emotion recognition in task-oriented dialogues. It contains emotion annotations for user utterances in the entire MultiWOZ (10k+ human-human dialogues) and DialMAGE (1k human-machine dialogues collected from our human trial). Overall, there are 83k user utterances annotated. In addition, the emotion annotation scheme is tailored to task-oriented dialogues and considers the valence, the elicitor, and the conduct of the user emotion.
8 PAPERS • 1 BENCHMARK
MUStARD++ is a multimodal sarcasm detection dataset (MUStARD) pre-annotated with 9 emotions. It can be used for the task of detecting the emotion in a sarcastic statement.
7 PAPERS • 1 BENCHMARK
ArmanEmo is a human-labeled emotion dataset of more than 7000 Persian sentences labeled for seven categories. The dataset has been collected from different resources, including Twitter, Instagram, and Digikala (an Iranian e-commerce company) comments. Labels are based on Ekman's six basic emotions (Anger, Fear, Happiness, Hatred, Sadness, Wonder) and another category (Other) to consider any other emotion not included in Ekman's model.
6 PAPERS • 1 BENCHMARK
Emotional Dialogue Acts data contains dialogue act labels for existing emotion multi-modal conversational datasets. We chose two popular multimodal emotion datasets: Multimodal EmotionLines Dataset (MELD) and Interactive Emotional dyadic MOtion CAPture database (IEMOCAP). EDAs reveal associations between dialogue acts and emotional states in a natural-conversational language such as Accept/Agree dialogue acts often occur with the Joy emotion, Apology with Sadness, and Thanking with Joy.
3 PAPERS • NO BENCHMARKS YET
EmoPars is a dataset of 30,000 Persian Tweets labeled with Ekman’s six basic emotions (Anger, Fear, Happiness, Sadness, Hatred, and Wonder). This is the first publicly available emotion dataset in the Persian language.
2 PAPERS • NO BENCHMARKS YET
The Sentimental LIAR dataset is a modified and further extended version of the LIAR extension introduced by Kirilin et al. In this dataset, the multi-class labeling of LIAR is converted to a binary annotation by changing half-true, false, barely-true and pants-fire labels to False, and the remaining labels to True. Furthermore, the speaker names are converted to numerical IDs in order to avoid bias with regards to the textual representation of names. The binary-label dataset is then extended by adding sentiments derived using the Google NLP API. Sentiment analysis determines the overall attitude of the text (i.e., whether it is positive or negative), and is quantified by a numerical score. If the sentiment score is positive, then the sample is tagged as Positive for the sentiment attribute, otherwise Negative is assigned. A further extension is introduced by adding emotion scores extracted using the IBM NLP API for each claim, which determine the detected level of 6 emotional states na
BanglaEmotion is a manually annotated Bangla Emotion corpus, which incorporates the diversity of fine-grained emotion expressions in social-media text. More fine-grained emotion labels are considered such as Sadness, Happiness, Disgust, Surprise, Fear and Anger - which are, according to Paul Ekman (1999), the six basic emotion categories. For this task, a large amount of raw text data are collected from the user’s comments on two different Facebook groups (Ekattor TV and Airport Magistrates) and from the public post of a popular blogger and activist Dr. Imran H Sarker. These comments are mostly reactions to ongoing socio-political issues and towards the economic success and failure of Bangladesh. A total of 32923 comments are scraped from the three sources aforementioned above. Out of these, a total of 6314 comments were annotated into the six categories. The distribution of the annotated corpus is as follows:
1 PAPER • NO BENCHMARKS YET
Dusha is a dataset for speech emotion recognition (SER) tasks. The corpus contains approximately 350 hours of data, more than 300 000 audio recordings with Russian speech and their transcripts. It is annotated using a crowd-sourcing platform and includes two subsets: acted and real-life.
1 PAPER • 2 BENCHMARKS
Reader Emotion News 20k Dataset
A modification on the ShEMO dataset with help of an Automatic Speech Recognition (ASR) system.
Affective Text (Test Corpus of SemEval 2007) by Carlo Strapparava & Rada Mihalcea.
0 PAPER • NO BENCHMARKS YET