ICON: Interactive Conversational Memory Network for Multimodal Emotion Detection

Emotion recognition in conversations is crucial for building empathetic machines. Present works in this domain do not explicitly consider the inter-personal influences that thrive in the emotional dynamics of dialogues... To this end, we propose Interactive COnversational memory Network (ICON), a multimodal emotion detection framework that extracts multimodal features from conversational videos and hierarchically models the self- and inter-speaker emotional influences into global memories. Such memories generate contextual summaries which aid in predicting the emotional orientation of utterance-videos. Our model outperforms state-of-the-art networks on multiple classification and regression tasks in two benchmark datasets. read more

PDF Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Emotion Recognition in Conversation IEMOCAP ICON Weighted-F1 58.54 # 25
Accuracy 59.09 # 8
Macro-F1 56.52 # 4
Emotion Recognition in Conversation SEMAINE ICON MAE (Valence) 0.181 # 5
MAE (Arousal) 0.19 # 5
MAE (Expectancy) 0.185 # 5
MAE (Power) 8.45 # 5

Methods