MMER: Multimodal Multi-task Learning for Speech Emotion Recognition

31 Mar 2022  ·  Sreyan Ghosh, Utkarsh Tyagi, S Ramaneswaran, Harshvardhan Srivastava, Dinesh Manocha ·

In this paper, we propose MMER, a novel Multimodal Multi-task learning approach for Speech Emotion Recognition. MMER leverages a novel multimodal network based on early-fusion and cross-modal self-attention between text and acoustic modalities and solves three novel auxiliary tasks for learning emotion recognition from spoken utterances. In practice, MMER outperforms all our baselines and achieves state-of-the-art performance on the IEMOCAP benchmark. Additionally, we conduct extensive ablation studies and results analysis to prove the effectiveness of our proposed approach.

PDF Abstract

Results from the Paper


Ranked #2 on Speech Emotion Recognition on IEMOCAP (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Speech Emotion Recognition IEMOCAP MMER WA 0.817 # 2

Methods


No methods listed for this paper. Add relevant methods here