TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech Emotion Recognition	IEMOCAP	emoDARTS	WA	.7803	# 7
Speech Emotion Recognition	IEMOCAP	emoDARTS	UA	.7655	# 2
Speech Emotion Recognition	MSP-IMPROV	emoDARTS	UA	.6563	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/emodarts-joint-optimisation-of-cnn-sequential/speech-emotion-recognition-on-msp-improv)](https://paperswithcode.com/sota/speech-emotion-recognition-on-msp-improv?p=emodarts-joint-optimisation-of-cnn-sequential)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/emodarts-joint-optimisation-of-cnn-sequential/speech-emotion-recognition-on-iemocap)](https://paperswithcode.com/sota/speech-emotion-recognition-on-iemocap?p=emodarts-joint-optimisation-of-cnn-sequential)`

emoDARTS: Joint Optimisation of CNN & Sequential Neural Network Architectures for Superior Speech Emotion Recognition

21 Mar 2024 · Thejan Rajapakshe, Rajib Rana, Sara Khalifa, Berrak Sisman, Bjorn W. Schuller, Carlos Busso ·

Speech Emotion Recognition (SER) is crucial for enabling computers to understand the emotions conveyed in human communication. With recent advancements in Deep Learning (DL), the performance of SER models has significantly improved. However, designing an optimal DL architecture requires specialised knowledge and experimental assessments. Fortunately, Neural Architecture Search (NAS) provides a potential solution for automatically determining the best DL model. The Differentiable Architecture Search (DARTS) is a particularly efficient method for discovering optimal models. This study presents emoDARTS, a DARTS-optimised joint CNN and Sequential Neural Network (SeqNN: LSTM, RNN) architecture that enhances SER performance. The literature supports the selection of CNN and LSTM coupling to improve performance. While DARTS has previously been used to choose CNN and LSTM operations independently, our technique adds a novel mechanism for selecting CNN and SeqNN operations in conjunction using DARTS. Unlike earlier work, we do not impose limits on the layer order of the CNN. Instead, we let DARTS choose the best layer order inside the DARTS cell. We demonstrate that emoDARTS outperforms conventionally designed CNN-LSTM models and surpasses the best-reported SER results achieved through DARTS on CNN-LSTM by evaluating our approach on the IEMOCAP, MSP-IMPROV, and MSP-Podcast datasets.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Emotion Recognition

Neural Architecture Search

Speech Emotion Recognition

Datasets

IEMOCAP

MSP-IMPROV

MSP-Podcast

Results from the Paper

Add Remove

Ranked #1 on Speech Emotion Recognition on MSP-IMPROV

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Speech Emotion Recognition	IEMOCAP	emoDARTS	WA	.7803	# 7	Compare
Speech Emotion Recognition	IEMOCAP	emoDARTS	UA	.7655	# 2	Compare
Speech Emotion Recognition	MSP-IMPROV	emoDARTS	UA	.6563	# 1	Compare

Methods

Add Remove

DARTS • LSTM • Sigmoid Activation • Tanh Activation

Edit Social Preview

emoDARTS: Joint Optimisation of CNN & Sequential Neural Network Architectures for Superior Speech Emotion Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove