In this paper, we apply multiscale area attention in a deep convolutional neural network to attend emotional characteristics with varied granularities and therefore the classifier can benefit from an ensemble of attentions with different scales.
Speech emotion recognition is a challenging task, and extensive reliance has been placed on models that use audio features in building well-performing classifiers.
In this work, we adopt a feature-engineering based approach to tackle the task of speech emotion recognition.
Ranked #2 on Speech Emotion Recognition on IEMOCAP
In this paper, we present a database of emotional speech intended to be open-sourced and used for synthesis and generation purpose.
Emotional voice conversion aims to transform emotional prosody in speech while preserving the linguistic content and speaker identity.
Multimodal emotion recognition from speech is an important area in affective computing.
Cross-lingual speech emotion recognition is an important task for practical applications.
In this work, we propose an interaction-aware attention network (IAAN) that incorporate contextual information in the learned vocal representation through a novel attention mechanism.
In this work, we explore the impact of visual modality in addition to speech and text for improving the accuracy of the emotion detection system.