Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion Recognition

Emotion recognition remains a complex task due to speaker variations and low-resource training samples. To address these difficulties, we focus on the domain adversarial neural networks (DANN) for emotion recognition. The primary task is to predict emotion labels. The secondary task is to learn a common representation where speaker identities can not be distinguished. By using this approach, we bring the representations of different speakers closer. Meanwhile, through using the unlabeled data in the training process, we alleviate the impact of low-resource training samples. In the meantime, prior work found that contextual information and multimodal features are important for emotion recognition. However, previous DANN-based approaches ignore this information, thus limiting their performance. In this paper, we propose the context-dependent domain adversarial neural network for multimodal emotion recognition. To verify the effectiveness of our proposed method, we conduct experiments on the benchmark dataset IEMOCAP. Experimental results demonstrate that the proposed method shows an absolute improvement of 3.48% over state-of-the-art strategies.

PDF Abstract

Datasets


Results from the Paper


 Ranked #1 on Speech Emotion Recognition on IEMOCAP (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Speech Emotion Recognition IEMOCAP DANN F1 - # 2
WA 0.827 # 1
UA - # 7

Methods


No methods listed for this paper. Add relevant methods here