🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task (clear)

Filter by Language

29 dataset results for Facial Expression Recognition (FER)

AffectNet is a large facial expression dataset with around 0.4 million images manually labeled for the presence of eight (neutral, happy, angry, sad, fear, surprise, disgust, contempt) facial expressions along with the intensity of valence and arousal.

266 PAPERS • 3 BENCHMARKS

CK+ (Extended Cohn-Kanade dataset)

The Extended Cohn-Kanade (CK+) dataset contains 593 video sequences from a total of 123 different subjects, ranging from 18 to 50 years of age with a variety of genders and heritage. Each video shows a facial shift from the neutral expression to a targeted peak expression, recorded at 30 frames per second (FPS) with a resolution of either 640x490 or 640x480 pixels. Out of these videos, 327 are labelled with one of seven expression classes: anger, contempt, disgust, fear, happiness, sadness, and surprise. The CK+ database is widely regarded as the most extensively used laboratory-controlled facial expression classification database available, and is used in the majority of facial expression classification methods.

220 PAPERS • 2 BENCHMARKS

FER2013 (Facial Expression Recognition 2013 Dataset)

Fer2013 contains approximately 30,000 facial RGB images of different expressions with size restricted to 48×48, and the main labels of it can be divided into 7 types: 0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral. The Disgust expression has the minimal number of images – 600, while other labels have nearly 5,000 samples each.

137 PAPERS • 4 BENCHMARKS

DISFA (Denver Intensity of Spontaneous Facial Action)

The Denver Intensity of Spontaneous Facial Action (DISFA) dataset consists of 27 videos of 4844 frames each, with 130,788 images in total. Action unit annotations are on different levels of intensity, which are ignored in the following experiments and action units are either set or unset. DISFA was selected from a wider range of databases popular in the field of facial expression recognition because of the high number of smiles, i.e. action unit 12. In detail, 30,792 have this action unit set, 82,176 images have some action unit(s) set and 48,612 images have no action unit(s) set at all.

136 PAPERS • 3 BENCHMARKS

RAF-DB (Real-world Affective Faces)

The Real-world Affective Faces Database (RAF-DB) is a dataset for facial expression. It contains 29672 facial images tagged with basic or compound expressions by 40 independent taggers. Images in this database are of great variability in subjects' age, gender and ethnicity, head poses, lighting conditions, occlusions, (e.g. glasses, facial hair or self-occlusion), post-processing operations (e.g. various filters and special effects), etc.

131 PAPERS • 2 BENCHMARKS

FER+ (Face Expression Recognition Plus dataset)

The FER+ dataset is an extension of the original FER dataset, where the images have been re-labelled into one of 8 emotion types: neutral, happiness, surprise, sadness, anger, disgust, fear, and contempt.

106 PAPERS • 2 BENCHMARKS

BP4D

The BP4D-Spontaneous dataset is a 3D video database of spontaneous facial expressions in a diverse group of young adults. Well-validated emotion inductions were used to elicit expressions of emotion and paralinguistic communication. Frame-level ground-truth for facial actions was obtained using the Facial Action Coding System. Facial features were tracked in both 2D and 3D domains using both person-specific and generic approaches. The database includes forty-one participants (23 women, 18 men). They were 18 – 29 years of age; 11 were Asian, 6 were African-American, 4 were Hispanic, and 20 were Euro-American. An emotion elicitation protocol was designed to elicit emotions of participants effectively. Eight tasks were covered with an interview process and a series of activities to elicit eight emotions. The database is structured by participants. Each participant is associated with 8 tasks. For each task, there are both 3D and 2D videos. As well, the Metadata include manually annotated

96 PAPERS • 3 BENCHMARKS

JAFFE (Japanese Female Facial Expression)

The JAFFE dataset consists of 213 images of different facial expressions from 10 different Japanese female subjects. Each subject was asked to do 7 facial expressions (6 basic facial expressions and neutral) and the images were annotated with average semantic ratings on each facial expression by 60 annotators.

86 PAPERS • 4 BENCHMARKS

Oulu-CASIA (Oulu-CASIA NIR&VIS facial expression database)

The Oulu-CASIA NIR&VIS facial expression database consists of six expressions (surprise, happiness, sadness, anger, fear and disgust) from 80 people between 23 and 58 years old. 73.8% of the subjects are males. The subjects were asked to sit on a chair in the observation room in a way that he/ she is in front of camera. Camera-face distance is about 60 cm. Subjects were asked to make a facial expression according to an expression example shown in picture sequences. The imaging hardware works at the rate of 25 frames per second and the image resolution is 320 × 240 pixels.

78 PAPERS • 4 BENCHMARKS

RaFD (Radboud Faces Database)

The Radboud Faces Database (RaFD) is a set of pictures of 67 models (both adult and children, males and females) displaying 8 emotional expressions.

77 PAPERS • 2 BENCHMARKS

MMI (MMI Facial Expression Database)

The MMI Facial Expression Database consists of over 2900 videos and high-resolution still images of 75 subjects. It is fully annotated for the presence of AUs in videos (event coding), and partially coded on frame-level, indicating for each frame whether an AU is in either the neutral, onset, apex or offset phase. A small part was annotated for audio-visual laughters.

59 PAPERS • 1 BENCHMARK

Aff-Wild2

Aff-Wild2 is an extension of the Aff-Wild dataset for affect recognition. It approximately doubles the number of included video frames and the number of subjects; thus, improving the variability of the included behaviors and of the involved persons.

56 PAPERS • 1 BENCHMARK

SFEW (Static Facial Expression in the Wild)

The Static Facial Expressions in the Wild (SFEW) dataset is a dataset for facial expression recognition. It was created by selecting static frames from the AFEW database by computing key frames based on facial point clustering. The most commonly used version, SFEW 2.0, was the benchmarking data for the SReco sub-challenge in EmotiW 2015. SFEW 2.0 has been divided into three sets: Train (958 samples), Val (436 samples) and Test (372 samples). Each of the images is assigned to one of seven expression categories, i.e., anger, disgust, fear, neutral, happiness, sadness, and surprise. The expression labels of the training and validation sets are publicly available, whereas those of the testing set are held back by the challenge organizer.

56 PAPERS • 1 BENCHMARK

ExpW (Expression in-the-Wild)

The Expression in-the-Wild (ExpW) dataset is for facial expression recognition and contains 91,793 faces manually labeled with expressions. Each of the face images is annotated as one of the seven basic expression categories: “angry”, “disgust”, “fear”, “happy”, “sad”, “surprise”, or “neutral”.

34 PAPERS • NO BENCHMARKS YET

DFEW

DFEW (Dynamic Facial Expression in the Wild)

Recently, facial expression recognition (FER) in the wild has gained a lot of researchers’ attention because it is a valuable topic to enable the FER techniques to move from the laboratory to the real applications. In this paper, we focus on this challenging but interesting topic and make contributions from three aspects. First, we present a new large-scale ’in-the-wild’ dynamic facial expression database, DFEW (Dynamic Facial Expression in the Wild), consisting of over 16,000 video clips from thousands of movies. These video clips contain various challenging interferences in practical scenarios such as extreme illumination, occlusions, and capricious pose changes. Second, we propose a novel method called ExpressionClustered Spatiotemporal Feature Learning (EC-STFL) framework to deal with dynamic FER in the wild. Third, we conduct extensive benchmark experiments on DFEW using a lot of spatiotemporal deep feature learning methods as well as our proposed EC-STFL. Experimental results sho

23 PAPERS • 1 BENCHMARK

CREMA-D

CREMA-D is an emotional multimodal actor data set of 7,442 original clips from 91 actors. These clips were from 48 male and 43 female actors between the ages of 20 and 74 coming from a variety of races and ethnicities (African America, Asian, Caucasian, Hispanic, and Unspecified).

20 PAPERS • 7 BENCHMARKS

RAVDESS

RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song)

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7,356 files (total size: 24.8 GB). The database contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. All conditions are available in three modality formats: Audio-only (16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and Video-only (no sound). Note, there are no song files for Actor_18.

18 PAPERS • 6 BENCHMARKS

FERV39k

Current benchmarks for facial expression recognition (FER) mainly focus on static images, while there are limited datasets for FER in videos. It is still ambiguous to evaluate whether performances of existing methods remain satisfactory in real-world application-oriented scenes. For example, the “Happy” expression with high intensity in Talk-Show is more discriminating than the same expression with low intensity in Official-Event. To fill this gap, we build a large-scale multi-scene dataset, coined as FERV39k. We analyze the important ingredients of constructing such a novel dataset in three aspects: (1) multi-scene hierarchy and expression class, (2) generation of candidate video clips, (3) trusted manual labelling process. Based on these guidelines, we select 4 scenarios subdivided into 22 scenes, annotate 86k samples automatically obtained from 4k videos based on the welldesigned workflow, and finally build 38,935 video clips labeled with 7 classic expressions. Experiment benchmarks

16 PAPERS • 1 BENCHMARK

4DFAB

4DFAB is a large scale database of dynamic high-resolution 3D faces which consists of recordings of 180 subjects captured in four different sessions spanning over a five-year period (2012 - 2017), resulting in a total of over 1,800,000 3D meshes. It contains 4D videos of subjects displaying both spontaneous and posed facial behaviours. The database can be used for both face and facial expression recognition, as well as behavioural biometrics. It can also be used to learn very powerful blendshapes for parametrising facial behaviour.

14 PAPERS • NO BENCHMARKS YET

DAiSEE

DAiSEE is a multi-label video classification dataset comprising of 9,068 video snippets captured from 112 users for recognizing the user affective states of boredom, confusion, engagement, and frustration "in the wild". The dataset has four levels of labels namely - very low, low, high, and very high for each of the affective states, which are crowd annotated and correlated with a gold standard annotation created using a team of expert psychologists.

14 PAPERS • NO BENCHMARKS YET

MAFW

MAFW is a large-scale, multi-modal, compound affective database for dynamic facial expression recognition in the wild. It contains 10,045 video-audio clips, annotated with a compound emotional category and a couple of sentences that describe the subjects' affective behaviors in the clip. For the compound emotion annotation, each clip is categorized into one or more of the 11 widely-used emotions, i.e., anger, disgust, fear, happiness, neutral, sadness, surprise, contempt, anxiety, helplessness, and disappointment.

13 PAPERS • 2 BENCHMARKS

Acted Facial Expressions In The Wild (AFEW)

Acted Facial Expressions In The Wild (AFEW) is a dynamic temporal facial expressions data corpus consisting of close to real world environment extracted from movie

7 PAPERS • 1 BENCHMARK

FERG (Facial Expression Research Group Database)

FERG is a database of cartoon characters with annotated facial expressions containing 55,769 annotated face images of six characters. The images for each character are grouped into 7 types of cardinal expressions, viz. anger, disgust, fear, joy, neutral, sadness and surprise.

7 PAPERS • 1 BENCHMARK

CAER

Jiyoung Lee, Seungryong Kim, Sunok Kim, Jungin Park, Kwanghoon Sohn; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 10143-10152

4 PAPERS • 2 BENCHMARKS

FEAFA+

FEAFA+ is a dataset for Facial expression analysis and 3D Facial animation. It includes 150 video sequences from FEAFA and DISFA, with a total of 230,184 frames being manually annotated on floating-point intensity value of 24 redefined AUs using the Expression Quantitative Tool.

3 PAPERS • NO BENCHMARKS YET

SAVEE

SAVEE (Surrey Audio-Visual Expressed Emotion)

The Surrey Audio-Visual Expressed Emotion (SAVEE) dataset was recorded as a pre-requisite for the development of an automatic emotion recognition system. The database consists of recordings from 4 male actors in 7 different emotions, 480 British English utterances in total. The sentences were chosen from the standard TIMIT corpus and phonetically-balanced for each emotion. The data were recorded in a visual media lab with high quality audio-visual equipment, processed and labeled. To check the quality of performance, the recordings were evaluated by 10 subjects under audio, visual and audio-visual conditions. Classification systems were built using standard features and classifiers for each of the audio, visual and audio-visual modalities, and speaker-independent recognition rates of 61%, 65% and 84% achieved respectively.

3 PAPERS • 1 BENCHMARK

MH-FED (Meta Human Facial Expression Dataset)

This dataset provides a collection of 162K images and 70 Videos of Meta-Humans. There are 10 Highly realistic Meta-Humans expressing 7 facial expressions.

1 PAPER • NO BENCHMARKS YET

4,458 People - 3D Facial Expressions Recognition Data

Description: 4,458 People - 3D Facial Expressions Recognition Data. The collection scenes include indoor scenes and outdoor scenes. The dataset includes males and females. The age distribution ranges from juvenile to the elderly, the young people and the middle aged are the majorities. The device includes iPhone X, iPhone XR. The data diversity includes different expressions, different ages, different races, different collecting scenes. This data can be used for tasks such as 3D facial expression recognition.

0 PAPER • NO BENCHMARKS YET

Human faces with mixed-race & various emotions

This dataset consists of 600+ items of faces with different emotions and mixed races that are ready to use for optimizing the accuracy of computer vision models. Human age range from 20 to 60 years old, balance in gender, no occlusion, with head direction (<45 degree up-down and left-right). All of the contents is sourced from PIXTA's stock library of 100M+ Asian-featured images and videos. PIXTA is the largest platform of visual materials in the Asia Pacific region offering fully-managed services, high quality contents and data, and powerful tools for businesses & organizations to enable their creative and machine learning projects. For more details, please refer to the link: https://www.pixta.ai/ Or send your inquiries to contact@pixta.ai

0 PAPER • NO BENCHMARKS YET

Datasets

29 dataset results for Facial Expression Recognition (FER)