TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech Emotion Recognition	RAVDESS	CNN-X (Shallow CNN)	Accuracy	82.99%	# 2
Speech Emotion Recognition	RAVDESS	CNN-X (Shallow CNN)	F1 Score	0.82	# 1
Speech Emotion Recognition	RAVDESS	CNN-X (Shallow CNN)	Precision	0.82	# 1
Speech Emotion Recognition	RAVDESS	CNN-X (Shallow CNN)	Recall	0.82	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/shallow-over-deep-neural-networks-a-empirical/speech-emotion-recognition-on-ravdess)](https://paperswithcode.com/sota/speech-emotion-recognition-on-ravdess?p=shallow-over-deep-neural-networks-a-empirical)`

Shallow over Deep Neural Networks: A empirical analysis for human emotion classification using audio data

International Conference on Internet of Things and Connected Technologies 2020 · Chandresh S. Kanani, Karanjit Singh Gill, Sourajit Behera, Anurag Choubey, Rohit Kumar Gupta, and Rajiv Misra ·

Human emotions can be identified in numerous ways, ranging from analyzing the tonal properties of speech to the facial expressions created before speech delivery and even the body gestures that can suggest various emotions without saying anything. Knowing the correct emotions of an individual can help is understand the situation and even react to it. This phenomena is even true for many feedback system used for day-to-day communication with humans, specifically the ones used for smart home solutions. The field of automated emotion recognition involves use-cases in different fields of research from computer vision, physiology to even artificial intelligence. This work focuses on classifying emotions into eight categories which are neutral, happy, sad, angry, calm, fearful, disgust and surprised based on the way those sentences have been spoken, using the “Ryerson Audio-Visual Database of Emotional Speech and Song” (RAVDESS). We propose a novel approach for emotion classification of audio conversations based on speech signals. Acoustic properties based emotion classification is independent of any spoken language and it can be used for cross-language emotion classification. The aim of the contribution was to develop a system capable of automatically recognising emotions for real-time speech. We performed several simulations and were able to achieve the highest accuracy of 82.99% with our shallow CNN model.

PDF