ROCStories is a collection of commonsense short stories. The corpus consists of 100,000 five-sentence stories. Each story logically follows everyday topics created by Amazon Mechanical Turk workers. These stories contain a variety of commonsense causal and temporal relations between everyday events. Writers also develop an additional 3,742 Story Cloze Test stories which contain a four-sentence-long body and two candidate endings. The endings were collected by asking Mechanical Turk workers to write both a right ending and a wrong ending after eliminating original endings of given short stories. Both endings were required to make logical sense and include at least one character from the main story line. The published ROCStories dataset is constructed with ROCStories as a training set that includes 98,162 stories that exclude candidate wrong endings, an evaluation set, and a test set, which have the same structure (1 body + 2 candidate endings) and a size of 1,871.
79 PAPERS • 2 BENCHMARKS
EmotionLines contains a total of 29245 labeled utterances from 2000 dialogues. Each utterance in dialogues is labeled with one of seven emotions, six Ekman’s basic emotions plus the neutral emotion. Each labeling was accomplished by 5 workers, and for each utterance in a label, the emotion category with the highest votes was set as the label of the utterance. Those utterances voted as more than two different emotions were put into the non-neutral category. Therefore the dataset has a total of 8 types of emotion labels, anger, disgust, fear, happiness, sadness, surprise, neutral, and non-neutral.
37 PAPERS • 1 BENCHMARK
EmoBank is a corpus of 10k English sentences balancing multiple genres, annotated with dimensional emotion metadata in the Valence-Arousal-Dominance (VAD) representation format. EmoBank excels with a bi-perspectival and bi-representational design.
30 PAPERS • NO BENCHMARKS YET
Over a period of many years during the 1990s, a large group of psychologists all over the world collected data in the ISEAR project, directed by Klaus R. Scherer and Harald Wallbott. Student respondents, both psychologists and non-psychologists, were asked to report situations in which they had experienced all of 7 major emotions (joy, fear, anger, sadness, disgust, shame, and guilt). In each case, the questions covered the way they had appraised the situation and how they reacted. The final data set thus contained reports on seven emotions each by close to 3000 respondents in 37 countries on all 5 continents.
21 PAPERS • NO BENCHMARKS YET
GoEmotions is a corpus of 58k carefully curated comments extracted from Reddit, with human annotations to 27 emotion categories or Neutral.
11 PAPERS • 1 BENCHMARK
A corpus designed in analogy to the well-established English ISEAR emotion dataset.
7 PAPERS • NO BENCHMARKS YET