TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech Emotion Recognition	RAVDESS	AlexNet (FineTuning)	Accuracy	61.67%	# 5
Speech Emotion Recognition	RAVDESS	CNN-14 (Fine-Tuning)	Accuracy	76.58%	# 4
Emotion Recognition	RAVDESS	Logistic Regression on posteriors of the CNN-14&biLSTM-GuidedST	Accuracy	80.08%	# 3
Facial Emotion Recognition	RAVDESS	Guided-ST and bi-LSTM with attention	Accuracy	57.08%	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multimodal-emotion-recognition-on-ravdess/emotion-recognition-on-ravdess)](https://paperswithcode.com/sota/emotion-recognition-on-ravdess?p=multimodal-emotion-recognition-on-ravdess)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multimodal-emotion-recognition-on-ravdess/facial-emotion-recognition-on-ravdess)](https://paperswithcode.com/sota/facial-emotion-recognition-on-ravdess?p=multimodal-emotion-recognition-on-ravdess)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multimodal-emotion-recognition-on-ravdess/speech-emotion-recognition-on-ravdess)](https://paperswithcode.com/sota/speech-emotion-recognition-on-ravdess?p=multimodal-emotion-recognition-on-ravdess)`

Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning

Sensors 2021 · Cristina Luna-Jiménez, David Griol, Zoraida Callejas, Ricardo Kleinlein, Juan M. Montero, Fernando Fernández-Martínez ·

Emotion Recognition is attracting the attention of the research community due to the multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper, we propose a multimodal emotion recognition system that relies on speech and facial information. For the speech-based modality, we evaluated several transfer-learning techniques, more specifically, embedding extraction and Fine-Tuning. The best accuracy results were achieved when we fine-tuned the CNN-14 of the PANNs framework, confirming that the training was more robust when it did not start from scratch and the tasks were similar. Regarding the facial emotion recognizers, we propose a framework that consists of a pre-trained Spatial Transformer Network on saliency maps and facial images followed by a bi-LSTM with an attention mechanism. The error analysis reported that the frame-based systems could present some problems when they were used directly to solve a video-based task despite the domain adaptation, which opens a new line of research to discover new ways to correct this mismatch and take advantage of the embedded knowledge of these pre-trained models. Finally, from the combination of these two modalities with a late fusion strategy, we achieved 80.08% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. The results revealed that these modalities carry relevant information to detect users’ emotional state and their combination enables improvement of system performance.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Domain Adaptation

Emotion Recognition

Facial Emotion Recognition

Multimodal Emotion Recognition

Speech Emotion Recognition

Transfer Learning

Datasets

ImageNet

IEMOCAP

AffectNet RAVDESS

Results from the Paper

Add Remove

Ranked #3 on Emotion Recognition on RAVDESS (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Speech Emotion Recognition	RAVDESS	AlexNet (FineTuning)	Accuracy	61.67%	# 5	Compare
Speech Emotion Recognition	RAVDESS	CNN-14 (Fine-Tuning)	Accuracy	76.58%	# 4	Compare
Emotion Recognition	RAVDESS	Logistic Regression on posteriors of the CNN-14&biLSTM-GuidedST	Accuracy	80.08%	# 3	Compare
Facial Emotion Recognition	RAVDESS	Guided-ST and bi-LSTM with attention	Accuracy	57.08%	# 3	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Spatial Transformer • Transformer

Edit Social Preview

Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove