Speech Emotion Recognition

77 papers with code • 14 benchmarks • 16 datasets

Speech Emotion Recognition is a task of speech processing and computational paralinguistics that aims to recognize and categorize the emotions expressed in spoken language. The goal is to determine the emotional state of a speaker, such as happiness, anger, sadness, or frustration, from their speech patterns, such as prosody, pitch, and rhythm.

For multimodal emotion recognition, please upload your result to Multimodal Emotion Recognition on IEMOCAP


Use these libraries to find Speech Emotion Recognition models and implementations

Most implemented papers

Attention Is All You Need

tensorflow/tensor2tensor NeurIPS 2017

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration.

Continuous control with deep reinforcement learning

ray-project/ray 9 Sep 2015

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain.

Multimodal Speech Emotion Recognition and Ambiguity Resolution

Demfier/multimodal-speech-emotion-recognition 12 Apr 2019

In this work, we adopt a feature-engineering based approach to tackle the task of speech emotion recognition.

Multimodal Speech Emotion Recognition Using Audio and Text

david-yoon/multimodal-speech-emotion 10 Oct 2018

Speech emotion recognition is a challenging task, and extensive reliance has been placed on models that use audio features in building well-performing classifiers.

Compact Graph Architecture for Speech Emotion Recognition

AmirSh15/Compact_SER 5 Aug 2020

We propose a deep graph approach to address the task of speech emotion recognition.

Speech Emotion Recognition Using Multi-hop Attention Mechanism

raulsteleac/Speech_Emotion_Recognition 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019

As opposed to using knowledge from both the modalities separately, we propose a framework to exploit acoustic information in tandem with lexical data.

Deep Learning based Emotion Recognition System Using Speech Features and Transcriptions

MagnusXu/Speech-Emotion-Recognition-Capstone-Project 11 Jun 2019

This paper proposes a speech emotion recognition method based on speech features and speech transcriptions (text).

Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset

HLTSingapore/Emotional-Speech-Data 28 Oct 2020

Emotional voice conversion aims to transform emotional prosody in speech while preserving the linguistic content and speaker identity.

AST: Audio Spectrogram Transformer

YuanGongND/ast 5 Apr 2021

In the past decade, convolutional neural networks (CNNs) have been widely adopted as the main building block for end-to-end audio classification models, which aim to learn a direct mapping from audio spectrograms to corresponding labels.

Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings

habla-liaa/ser-with-w2v2 8 Apr 2021

Emotion recognition datasets are relatively small, making the use of the more sophisticated deep learning approaches challenging.