Context-Dependent Sentiment Analysis in User-Generated Videos
Multimodal sentiment analysis is a developing area of research, which involves the identification of sentiments in videos. Current research considers utterances as independent entities, i.e., ignores the interdependencies and relations among the utterances of a video. In this paper, we propose a LSTM-based model that enables utterances to capture contextual information from their surroundings in the same video, thus aiding the classification process. Our method shows 5-10{\%} performance improvement over the state of the art and high robustness to generalizability.
PDF AbstractTask | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Emotion Recognition in Conversation | CPED | bcLSTM | Accuracy of Sentiment | 49.65 | # 3 | |
Macro-F1 of Sentiment | 45.40 | # 3 | ||||
Multimodal Emotion Recognition | IEMOCAP | bc-LSTM | Unweighted Accuracy (UA) | 0.741 | # 5 | |
Emotion Recognition in Conversation | IEMOCAP | bc-LSTM+Att | Weighted-F1 | 58.54 | # 57 | |
Accuracy | 59.09 | # 30 | ||||
Macro-F1 | 56.52 | # 4 | ||||
Emotion Recognition in Conversation | MELD | bc-LSTM+Att | Weighted-F1 | 56.44 | # 64 | |
Accuracy | 57.50 | # 21 | ||||
Multimodal Sentiment Analysis | MOSI | bc-LSTM | Accuracy | 80.3% | # 9 |