Tracing Intricate Cues in Dialogue: Joint Graph Structure and Sentiment Dynamics for Multimodal Emotion Recognition

31 Jul 2024  ยท  Jiang Li, XiaoPing Wang, Zhigang Zeng ยท

Multimodal emotion recognition in conversation (MERC) has garnered substantial research attention recently. Existing MERC methods face several challenges: (1) they fail to fully harness direct inter-modal cues, possibly leading to less-than-thorough cross-modal modeling; (2) they concurrently extract information from the same and different modalities at each network layer, potentially triggering conflicts from the fusion of multi-source data; (3) they lack the agility required to detect dynamic sentimental changes, perhaps resulting in inaccurate classification of utterances with abrupt sentiment shifts. To address these issues, a novel approach named GraphSmile is proposed for tracking intricate emotional cues in multimodal dialogues. GraphSmile comprises two key components, i.e., GSF and SDP modules. GSF ingeniously leverages graph structures to alternately assimilate inter-modal and intra-modal emotional dependencies layer by layer, adequately capturing cross-modal cues while effectively circumventing fusion conflicts. SDP is an auxiliary task to explicitly delineate the sentiment dynamics between utterances, promoting the model's ability to distinguish sentimental discrepancies. Furthermore, GraphSmile is effortlessly applied to multimodal sentiment analysis in conversation (MSAC), forging a unified multimodal affective model capable of executing MERC and MSAC tasks. Empirical results on multiple benchmarks demonstrate that GraphSmile can handle complex emotional and sentimental patterns, significantly outperforming baseline models.

PDF Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Emotion Recognition in Conversation CMU-MOSEI-Sentiment GraphSmile Weighted F1 44.93 # 1
Accuracy 46.82 # 1
Multimodal Emotion Recognition CMU-MOSEI-Sentiment GraphSmile Weighted F1 44.93 # 1
Accuracy 46.82 # 1
Multimodal Emotion Recognition CMU-MOSEI-Sentiment-3 GraphSmile Weighted F1 66.73 # 1
Accuracy 67.73 # 1
Emotion Recognition in Conversation CMU-MOSEI-Sentiment-3 GraphSmile Weighted F1 66.73 # 1
Accuracy 67.73 # 1
Emotion Recognition in Conversation IEMOCAP GraphSmile Weighted-F1 72.81 # 4
Accuracy 72.77 # 5
Multimodal Emotion Recognition IEMOCAP GraphSmile Accuracy 72.77 # 1
Weighted F1 72.81 # 1
Multimodal Emotion Recognition IEMOCAP-4 GraphSmile Weighted F1 86.52 # 1
Accuracy 86.53 # 1
Emotion Recognition in Conversation IEMOCAP-4 GraphSmile Weighted F1 86.52 # 1
Accuracy 86.53 # 1
Emotion Recognition in Conversation MELD GraphSmile Weighted-F1 66.71 # 15
Accuracy 67.70 # 8
Multimodal Emotion Recognition MELD GraphSmile Weighted F1 66.71 # 1
Accuracy 67.70 # 1
Multimodal Emotion Recognition MELD-Sentiment GraphSmile Weighted F1 74.31 # 1
Accuracy 74.44 # 1
Emotion Recognition in Conversation MELD-Sentiment GraphSmile Weighted F1 74.31 # 1
Accuracy 74.44 # 1

Methods


No methods listed for this paper. Add relevant methods here