CNN+LSTM Architecture for Speech Emotion Recognition with Data Augmentation

15 Feb 2018Caroline EtienneGuillaume FidanzaAndrei PetrovskiiLaurence DevillersBenoit Schmauch

In this work we design a neural network for recognizing emotions in speech, using the IEMOCAP dataset. Following the latest advances in audio analysis, we use an architecture involving both convolutional layers, for extracting high-level features from raw spectrograms, and recurrent ones for aggregating long-term dependencies... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Speech Emotion Recognition IEMOCAP UA 0.653 # 2
Speech Emotion Recognition IEMOCAP CNN+LSTM UA 0.65 # 3

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet