Context-Dependent Sentiment Analysis in User-Generated Videos

Multimodal sentiment analysis is a developing area of research, which involves the identification of sentiments in videos. Current research considers utterances as independent entities, i.e., ignores the interdependencies and relations among the utterances of a video. In this paper, we propose a LSTM-based model that enables utterances to capture contextual information from their surroundings in the same video, thus aiding the classification process. Our method shows 5-10{\%} performance improvement over the state of the art and high robustness to generalizability.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Emotion Recognition in Conversation CPED bcLSTM Accuracy of Sentiment 49.65 # 3
Macro-F1 of Sentiment 45.40 # 3
Emotion Recognition in Conversation IEMOCAP bc-LSTM+Att Weighted-F1 58.54 # 43
Accuracy 59.09 # 23
Macro-F1 56.52 # 4
Multimodal Emotion Recognition IEMOCAP bc-LSTM Unweighted Accuracy (UA) 0.741 # 5
Emotion Recognition in Conversation MELD bc-LSTM+Att Weighted-F1 56.44 # 49
Accuracy 57.50 # 12
Multimodal Sentiment Analysis MOSI bc-LSTM Accuracy 80.3% # 8


No methods listed for this paper. Add relevant methods here