RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song)

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7,356 files (total size: 24.8 GB). The database contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. All conditions are available in three modality formats: Audio-only (16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and Video-only (no sound). Note, there are no song files for Actor_18.

Paper: The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English

Source:

Homepage

Benchmarks

Add a new result Link an existing benchmark

Task	Dataset Variant	Best Model
Speech Emotion Recognition	RAVDESS	VQ-MAE-S-12
Emotion Recognition	RAVDESS	LogisticRegression on posteriors of xlsr-Wav2Vec2.0&bi-LSTM+Attention
Facial Emotion Recognition	RAVDESS	IT4, visual only
Audio Classification	RAVDESS	ASM-RH-A
Emotion Classification	RAVDESS	ERANN-0-4
Facial Expression Recognition (FER)	RAVDESS	EmoAffectNet LSTM