Multimodal Emotion Recognition
57 papers with code • 3 benchmarks • 9 datasets
This is a leaderboard for multimodal emotion recognition on the IEMOCAP dataset. The modality abbreviations are A: Acoustic T: Text V: Visual
Please include the modality in the bracket after the model name.
All models must use standard five emotion categories and are evaluated in standard leave-one-session-out (LOSO). See the papers for references.
Libraries
Use these libraries to find Multimodal Emotion Recognition models and implementationsMost implemented papers
Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition
Despite the recent achievements made in the multi-modal emotion recognition task, two problems still exist and have not been well investigated: 1) the relationship between different emotion categories are not utilized, which leads to sub-optimal performance; and 2) current models fail to cope well with low-resource emotions, especially for unseen emotions.
Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion
Emotion Recognition is a challenging research area given its complex nature, and humans express emotional cues across various modalities such as language, facial expressions, and speech.
MSAF: Multimodal Split Attention Fusion
Multimodal learning mimics the reasoning process of the human multi-sensory system, which is used to perceive the surrounding world.
Combining deep and unsupervised features for multilingual speech emotion recognition
The proposed model, PATHOSnet, was trained and evaluated on multiple corpora with different spoken languages (IEMOCAP, EmoFilm, SES and AESI).
Multimodal Emotion Recognition with High-level Speech and Text Features
Current works train deep learning models on low-level data representations to solve the emotion recognition task.
A cross-modal fusion network based on self-attention and residual structure for multimodal emotion recognition
Firstly, we perform representation learning for audio and video modalities to obtain the semantic features of the two modalities by efficient ResNeXt and 1D CNN, respectively.
Cross Attentional Audio-Visual Fusion for Dimensional Emotion Recognition
Results indicate that our cross-attentional A-V fusion model is a cost-effective approach that outperforms state-of-the-art fusion approaches.
Shapes of Emotions: Multimodal Emotion Recognition in Conversations via Emotion Shifts
The proposed emotion-shift component is modular and can be added to any existing multimodal ERC model (with a few modifications).
A proposal for Multimodal Emotion Recognition using aural transformers and Action Units on RAVDESS dataset
Regarding the facial emotion recognizer, we extracted the Action Units of the videos and compared the performance between employing static models against sequential models.
Group Gated Fusion on Attention-based Bidirectional Alignment for Multimodal Emotion Recognition
Emotion recognition is a challenging and actively-studied research area that plays a critical role in emotion-aware human-computer interaction systems.