Multimodal Emotion Recognition

57 papers with code • 3 benchmarks • 9 datasets

This is a leaderboard for multimodal emotion recognition on the IEMOCAP dataset. The modality abbreviations are A: Acoustic T: Text V: Visual

Please include the modality in the bracket after the model name.

All models must use standard five emotion categories and are evaluated in standard leave-one-session-out (LOSO). See the papers for references.

Libraries

Use these libraries to find Multimodal Emotion Recognition models and implementations

Most implemented papers

Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition

wenliangdai/Modality-Transferable-MER Asian Chapter of the Association for Computational Linguistics 2020

Despite the recent achievements made in the multi-modal emotion recognition task, two problems still exist and have not been well investigated: 1) the relationship between different emotion categories are not utilized, which leads to sub-optimal performance; and 2) current models fail to cope well with low-resource emotions, especially for unseen emotions.

Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion

shamanez/Self-Supervised-Embedding-Fusion-Transformer 27 Oct 2020

Emotion Recognition is a challenging research area given its complex nature, and humans express emotional cues across various modalities such as language, facial expressions, and speech.

MSAF: Multimodal Split Attention Fusion

anita-hu/MSAF 13 Dec 2020

Multimodal learning mimics the reasoning process of the human multi-sensory system, which is used to perceive the surrounding world.

Multimodal Emotion Recognition with High-level Speech and Text Features

mmakiuchi/multimodal_emotion_recognition 29 Sep 2021

Current works train deep learning models on low-level data representations to solve the emotion recognition task.

A cross-modal fusion network based on self-attention and residual structure for multimodal emotion recognition

skeletonnn/cfn-sr 3 Nov 2021

Firstly, we perform representation learning for audio and video modalities to obtain the semantic features of the two modalities by efficient ResNeXt and 1D CNN, respectively.

Cross Attentional Audio-Visual Fusion for Dimensional Emotion Recognition

praveena2j/cross-attentional-av-fusion 9 Nov 2021

Results indicate that our cross-attentional A-V fusion model is a cost-effective approach that outperforms state-of-the-art fusion approaches.

Shapes of Emotions: Multimodal Emotion Recognition in Conversations via Emotion Shifts

exploration-lab/shapes-of-emotion MMMPIE (COLING) 2022

The proposed emotion-shift component is modular and can be added to any existing multimodal ERC model (with a few modifications).

A proposal for Multimodal Emotion Recognition using aural transformers and Action Units on RAVDESS dataset

cristinalunaj/MMEmotionRecognition Applied Sciences Journal 2021

Regarding the facial emotion recognizer, we extracted the Action Units of the videos and compared the performance between employing static models against sequential models.

Group Gated Fusion on Attention-based Bidirectional Alignment for Multimodal Emotion Recognition

ppfliu/emotion-recognition 17 Jan 2022

Emotion recognition is a challenging and actively-studied research area that plays a critical role in emotion-aware human-computer interaction systems.