Speech Emotion Recognition
77 papers with code • 14 benchmarks • 16 datasets
Speech Emotion Recognition is a task of speech processing and computational paralinguistics that aims to recognize and categorize the emotions expressed in spoken language. The goal is to determine the emotional state of a speaker, such as happiness, anger, sadness, or frustration, from their speech patterns, such as prosody, pitch, and rhythm.
For multimodal emotion recognition, please upload your result to Multimodal Emotion Recognition on IEMOCAP
These leaderboards are used to track progress in Speech Emotion Recognition
LibrariesUse these libraries to find Speech Emotion Recognition models and implementations
Most implemented papers
Attention Is All You Need
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration.
Continuous control with deep reinforcement learning
We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain.
Multimodal Speech Emotion Recognition and Ambiguity Resolution
In this work, we adopt a feature-engineering based approach to tackle the task of speech emotion recognition.
Multimodal Speech Emotion Recognition Using Audio and Text
Speech emotion recognition is a challenging task, and extensive reliance has been placed on models that use audio features in building well-performing classifiers.
Compact Graph Architecture for Speech Emotion Recognition
We propose a deep graph approach to address the task of speech emotion recognition.
Speech Emotion Recognition Using Multi-hop Attention Mechanism
As opposed to using knowledge from both the modalities separately, we propose a framework to exploit acoustic information in tandem with lexical data.
Deep Learning based Emotion Recognition System Using Speech Features and Transcriptions
This paper proposes a speech emotion recognition method based on speech features and speech transcriptions (text).
Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset
Emotional voice conversion aims to transform emotional prosody in speech while preserving the linguistic content and speaker identity.
AST: Audio Spectrogram Transformer
In the past decade, convolutional neural networks (CNNs) have been widely adopted as the main building block for end-to-end audio classification models, which aim to learn a direct mapping from audio spectrograms to corresponding labels.
Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings
Emotion recognition datasets are relatively small, making the use of the more sophisticated deep learning approaches challenging.