Multimodal Emotion Recognition
57 papers with code • 3 benchmarks • 9 datasets
This is a leaderboard for multimodal emotion recognition on the IEMOCAP dataset. The modality abbreviations are A: Acoustic T: Text V: Visual
Please include the modality in the bracket after the model name.
All models must use standard five emotion categories and are evaluated in standard leave-one-session-out (LOSO). See the papers for references.
Libraries
Use these libraries to find Multimodal Emotion Recognition models and implementationsLatest papers
MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition
In comparison to state-of-the-art multimodal supervised learning models for dynamic emotion recognition, MultiMAE-DER enhances the weighted average recall (WAR) by 4. 41% on the RAVDESS dataset and by 2. 06% on the CREMAD.
MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition
In addition to expanding the dataset size, we introduce a new track around open-vocabulary emotion recognition.
Cooperative Sentiment Agents for Multimodal Sentiment Analysis
In this paper, we propose a new Multimodal Representation Learning (MRL) method for Multimodal Sentiment Analysis (MSA), which facilitates the adaptive interaction between modalities through Cooperative Sentiment Agents, named Co-SA.
MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild
Within the field of multimodal DFER, recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
MIPS at SemEval-2024 Task 3: Multimodal Emotion-Cause Pair Extraction in Conversations with Multimodal Language Models
This paper presents our winning submission to Subtask 2 of SemEval 2024 Task 3 on multimodal emotion cause analysis in conversations.
Recursive Joint Cross-Modal Attention for Multimodal Fusion in Dimensional Emotion Recognition
In particular, we compute the attention weights based on cross-correlation between the joint audio-visual-text feature representations and the feature representations of individual modalities to simultaneously capture intra- and intermodal relationships across the modalities.
Joint Multimodal Transformer for Emotion Recognition in the Wild
Multimodal emotion recognition (MMER) systems typically outperform unimodal systems by leveraging the inter- and intra-modal relationships between, e. g., visual, textual, physiological, and auditory modalities.
Curriculum Learning Meets Directed Acyclic Graph for Multimodal Emotion Recognition
Emotion recognition in conversation (ERC) is a crucial task in natural language processing and affective computing.
Modality-Collaborative Transformer with Hybrid Feature Reconstruction for Robust Emotion Recognition
As a vital aspect of affective computing, Multimodal Emotion Recognition has been an active research area in the multimedia community.
GPT-4V with Emotion: A Zero-shot Benchmark for Generalized Emotion Recognition
To bridge this gap, we present the quantitative evaluation results of GPT-4V on 21 benchmark datasets covering 6 tasks: visual sentiment analysis, tweet sentiment analysis, micro-expression recognition, facial emotion recognition, dynamic facial emotion recognition, and multimodal emotion recognition.