UniMSE: Towards Unified Multimodal Sentiment Analysis and Emotion Recognition

21 Nov 2022  ยท  Guimin Hu, Ting-En Lin, Yi Zhao, Guangming Lu, Yuchuan Wu, Yongbin Li ยท

Multimodal sentiment analysis (MSA) and emotion recognition in conversation (ERC) are key research topics for computers to understand human behaviors. From a psychological perspective, emotions are the expression of affect or feelings during a short period, while sentiments are formed and held for a longer period. However, most existing works study sentiment and emotion separately and do not fully exploit the complementary knowledge behind the two. In this paper, we propose a multimodal sentiment knowledge-sharing framework (UniMSE) that unifies MSA and ERC tasks from features, labels, and models. We perform modality fusion at the syntactic and semantic levels and introduce contrastive learning between modalities and samples to better capture the difference and consistency between sentiments and emotions. Experiments on four public benchmark datasets, MOSI, MOSEI, MELD, and IEMOCAP, demonstrate the effectiveness of the proposed method and achieve consistent improvements compared with state-of-the-art methods.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Multimodal Sentiment Analysis CMU-MOSEI UniMSE Accuracy 87.50 # 2
MAE 0.523 # 2
F1 87.46 # 2
Multimodal Sentiment Analysis CMU-MOSI UniMSE F1 86.42 # 1
MAE 0.691 # 3
Corr 0.809 # 2
Acc-7 48.68 # 2
Acc-2 86.9 # 3
Emotion Recognition in Conversation IEMOCAP UniMSE Weighted-F1 70.66 # 9
Accuracy 70.56 # 6
Emotion Recognition in Conversation MELD UniMSE Weighted-F1 65.51 # 24
Accuracy 65.09 # 10
Multimodal Sentiment Analysis MOSI UniMSE Accuracy 86.9 # 2
F1 score 86.42 # 1

Methods