Multimodal Emotion Recognition IEMOCAP The IEMOCAP dataset consists of 151 videos of recorded dialogues, with 2 speakers per session for a total of 302 videos across the dataset. Each segment is annotated for the presence of 9 emotions (angry, excited, fear, sad, surprised, frustrated, happy, disappointed and neutral) as well as valence, arousal and dominance. The dataset is recorded across 5 sessions with 5 pairs of speakers.
204 PAPERS • 3 BENCHMARKS
DailyDialog is a high-quality multi-turn open-domain English dialog dataset. It contains 13,118 dialogues split into a training set with 11,118 dialogues and validation and test sets with 1000 dialogues each. On average there are around 8 speaker turns per dialogue with around 15 tokens per turn.
93 PAPERS • 2 BENCHMARKS
The SEMAINE videos dataset contains spontaneous data capturing the audiovisual interaction between a human and an operator undertaking the role of an avatar with four personalities: Poppy (happy), Obadiah (gloomy), Spike (angry) and Prudence (pragmatic). The audiovisual sequences have been recorded at a video rate of 25 fps (352 x 288 pixels). The dataset consists of audiovisual interaction between a human and an operator undertaking the role of an agent (Sensitive Artificial Agent). SEMAINE video clips have been annotated with couples of epistemic states such as agreement, interested, certain, concentration, and thoughtful with continuous rating (within the range [1,-1]) where -1 indicates most negative rating (i.e: No concentration at all) and +1 defines the highest (Most concentration). Twenty-four recording sessions are used in the Solid SAL scenario. Recordings are made of both the user and the operator, and there are usually four character interactions in each recording session,
38 PAPERS • 1 BENCHMARK
EmoContext consists of three-turn English Tweets. The emotion labels include happiness, sadness, anger and other.
36 PAPERS • 1 BENCHMARK
Multimodal EmotionLines Dataset (MELD) has been created by enhancing and extending EmotionLines dataset. MELD contains the same dialogue instances available in EmotionLines, but it also encompasses audio and visual modality along with text. MELD has more than 1400 dialogues and 13000 utterances from Friends TV series. Multiple speakers participated in the dialogues. Each utterance in a dialogue has been labeled by any of these seven emotions -- Anger, Disgust, Sadness, Joy, Neutral, Surprise and Fear. MELD also has sentiment (positive, negative and neutral) annotation for each utterance.
30 PAPERS • 1 BENCHMARK
EmotionLines contains a total of 29245 labeled utterances from 2000 dialogues. Each utterance in dialogues is labeled with one of seven emotions, six Ekman’s basic emotions plus the neutral emotion. Each labeling was accomplished by 5 workers, and for each utterance in a label, the emotion category with the highest votes was set as the label of the utterance. Those utterances voted as more than two different emotions were put into the non-neutral category. Therefore the dataset has a total of 8 types of emotion labels, anger, disgust, fear, happiness, sadness, surprise, neutral, and non-neutral.
22 PAPERS • 1 BENCHMARK
EmoryNLP comprises 97 episodes, 897 scenes, and 12,606 utterances, where each utterance is annotated with one of the seven emotions borrowed from the six primary emotions in the Willcox (1982)’s feeling wheel, sad, mad, scared, powerful, peaceful, joyful, and a default emotion of neutral.
7 PAPERS • 1 BENCHMARK