The LFW dataset contains 13,233 images of faces collected from the web. This dataset consists of the 5749 identities with 1680 people with two or more images. In the standard LFW evaluation protocol the verification accuracies are reported on 6000 face pairs.
811 PAPERS • 12 BENCHMARKS
The CASIA-WebFace dataset is used for face verification and face identification tasks. The dataset contains 494,414 face images of 10,575 real identities collected from the web.
408 PAPERS • 2 BENCHMARKS
The MS-Celeb-1M dataset is a large-scale face recognition dataset consists of 100K identities, and each identity has about 100 facial images. The original identity labels are obtained automatically from webpages.
254 PAPERS • NO BENCHMARKS YET
The Extended Yale B database contains 2414 frontal-face images with size 192×168 over 38 subjects and about 64 images per subject. The images were captured under different lighting conditions and various facial expressions.
183 PAPERS • 1 BENCHMARK
MORPH is a facial age estimation dataset, which contains 55,134 facial images of 13,617 subjects ranging from 16 to 77 years old.
177 PAPERS • 8 BENCHMARKS
The IJB-B dataset is a template-based face dataset that contains 1845 subjects with 11,754 images, 55,025 frames and 7,011 videos where a template consists of a varying number of still images and video frames from different sources. These images and videos are collected from the Internet and are totally unconstrained, with large variations in pose, illumination, image quality etc. In addition, the dataset comes with protocols for 1-to-1 template-based face verification, 1-to-N template-based open-set face identification, and 1-to-N open-set video face identification.
156 PAPERS • 5 BENCHMARKS
The Adience dataset, published in 2014, contains 26,580 photos across 2,284 subjects with a binary gender label and one label from eight different age groups, partitioned into five splits. The key principle of the data set is to capture the images as close to real world conditions as possible, including all variations in appearance, pose, lighting condition and image quality, to name a few.
121 PAPERS • 6 BENCHMARKS
The VGG Face dataset is face identity recognition dataset that consists of 2,622 identities. It contains over 2.6 million images.
94 PAPERS • 1 BENCHMARK
To validate the racial bias of four commercial APIs and four state-of-the-art (SOTA) algorithms.
69 PAPERS • NO BENCHMARKS YET
CASIA-FASD is a small face anti-spoofing dataset containing 50 subjects.
41 PAPERS • NO BENCHMARKS YET
Partial REID is a specially designed partial person reidentification dataset that includes 600 images from 60 people, with 5 full-body images and 5 occluded images per person. These images were collected on a university campus by 6 cameras from different viewpoints, backgrounds and different types of occlusion. The examples of partial persons in the Partial REID dataset are shown in the Figure.
41 PAPERS • 1 BENCHMARK
The color FERET database is a dataset for face recognition. It contains 11,338 color images of size 512×768 pixels captured in a semi-controlled environment with 13 different poses from 994 subjects.
36 PAPERS • 3 BENCHMARKS
UMDFaces is a face dataset divided into two parts:
34 PAPERS • NO BENCHMARKS YET
TinyFace is a large scale face recognition benchmark to facilitate the investigation of natively LRFR (Low Resolution Face Recognition) at large scales (large gallery population sizes) in deep learning. The TinyFace dataset consists of 5,139 labelled facial identities given by 169,403 native LR face images (average 20×16 pixels) designed for 1:N recognition test. All the LR faces in TinyFace are collected from public web data across a large variety of imaging scenarios, captured under uncontrolled viewing conditions in pose, illumination, occlusion and background.
28 PAPERS • 1 BENCHMARK
CelebA-Spoof is a large-scale face anti-spoofing dataset with the following properties:
27 PAPERS • NO BENCHMARKS YET
IMDb-Face is large-scale noise-controlled dataset for face recognition research. The dataset contains about 1.7 million faces, 59k identities, which is manually cleaned from 2.0 million raw images. All images are obtained from the IMDb website.
22 PAPERS • NO BENCHMARKS YET
Dataset for face anti-spoofing in terms of both subjects and modalities. Specifically, it consists of subjects with videos and each sample has modalities (i.e., RGB, Depth and IR).
21 PAPERS • NO BENCHMARKS YET
DigiFace-1M is a synthetic dataset for face recognition, obtained by rendering digital faces using a computer graphics pipeline. It contains 1.22M images of 110K unique identities. The dataset consists of two parts. The first part contains 720K images with 10K identities. For each identity, 4 different sets of accessories are sampled and 18 images are rendered for each set. The second part contains 500K images with 100K identities. For each identity, only one set of accessories is sampled and only 5 images are rendered. Following the format of the existing datasets, we provide the aligned crop around the face, resized into $112 \times 112$ resolution.
18 PAPERS • NO BENCHMARKS YET
The Replay-Mobile Database for face spoofing consists of 1190 video clips of photo and video attack attempts to 40 clients, under different lighting conditions. These videos were recorded with current devices from the market -- an iPad Mini2 (running iOS) and a LG-G4 smartphone (running Android). This Database was produced at the Idiap Research Institute (Switzerland) within the framework of collaboration with Galician Research and Development Center in Advanced Telecommunications - Gradiant (Spain).
WebFace260M is a million-scale face benchmark, which is constructed for the research community towards closing the data gap behind the industry.
A renovation of Labeled Faces in the Wild (LFW), the de facto standard testbed for unconstraint face verification.
17 PAPERS • 4 BENCHMARKS
During the COVID-19 coronavirus epidemic, almost everyone wears a facial mask, which poses a huge challenge to face recognition. Traditional face recognition systems may not effectively recognize the masked faces, but removing the mask for authentication will increase the risk of virus infection. Inspired by the COVID-19 pandemic response, the widespread requirement that people wear protective face masks in public places has driven a need to understand how face recognition technology deals with occluded faces, often with just the periocular area and above visible.
16 PAPERS • 1 BENCHMARK
An evaluation protocol for face verification focusing on a large intra-pair image quality difference.
The IDiff-Face dataset was proposed in the paper "IDiff-Face: Synthetic-based Face Recognition through Fizzy Identity-Conditioned Diffusion Models". This dataset is synthetically generated using the IDiff-Face model.
12 PAPERS • NO BENCHMARKS YET
Real-World Masked Face Dataset (RMFD) is a large dataset for masked face detection.
11 PAPERS • NO BENCHMARKS YET
A new face annotation dataset with balanced distribution between genders and ethnic origins.
10 PAPERS • 2 BENCHMARKS
QMUL-SurvFace is a surveillance face recognition benchmark that contains 463,507 face images of 15,573 distinct identities captured in real-world uncooperative surveillance scenes over wide space and time.
10 PAPERS • 1 BENCHMARK
MeGlass is an eyeglass dataset originally designed for eyeglass face recognition evaluation. All the face images are selected and cleaned from MegaFace. Each identity has at least two face images with eyeglass and two face images without eyeglass. It contains 47,817 images from 1,710 different identities.
8 PAPERS • NO BENCHMARKS YET
The iCartoonFace dataset is a large-scale dataset that can be used for two different tasks: cartoon face detection and cartoon face recognition.
8 PAPERS • 1 BENCHMARK
Contains over 11000 images of 1000 identities with different types of disguise accessories. The dataset is collected from the Internet, resulting in unconstrained face images similar to real world settings.
7 PAPERS • 3 BENCHMARKS
Although deep face recognition has achieved impressive results in recent years, there is increasing controversy regarding racial and gender bias of the models, questioning their trustworthiness and deployment into sensitive scenarios. DemogPairs is a validation set with 10.8K facial images and 58.3M identity verification pairs, distributed in demographically-balanced folds of Asian, Black and White females and males. We also propose a benchmark of experiments using DemogPairs over state-of-the-art deep face recognition models in order to analyze their cross-demographic behavior and potential demographic biases (see figure below).
7 PAPERS • NO BENCHMARKS YET
A multimodal database for eye blink detection and attention level estimation.
MCXFace is a heterogeneous face recognition dataset consisting of multi-channel image samples for 51 subjects. For each subject color (RGB), thermal, near-infrared (850 nm), short-wave infrared (1300 nm), Depth, Stereo depth, and depth estimated from RGB images are available. Overall 7406 images together with landmark annotations and standard protocols are available in this dataset.
6 PAPERS • NO BENCHMARKS YET
The Masked LFW (MLFW), based on Cross-Age LFW (CALFW) database, is built using a simple but effective tool that generates masked faces from unmasked faces automatically.
6 PAPERS • 1 BENCHMARK
Aims to facilitate research in caricature recognition. All the caricatures and face images were collected from the Web. Compared with two existing datasets, this dataset is much more challenging, with a much greater number of available images, artistic styles and larger intra-personal variations.
iQIYI-VID dataset, which comprises video clips from iQIYI variety shows, films, and television dramas. The whole dataset contains 500,000 videos clips of 5,000 celebrities. The length of each video is 1~30 seconds.
Drone Surveillance of Faces, is a large-scale drone dataset intended to facilitate research for face recognition using drones.
5 PAPERS • 1 BENCHMARK
ROF is a dataset for occluded face recognition that contains faces with both upper face occlusion, due to sunglasses, and lower face occlusion, due to masks.
5 PAPERS • NO BENCHMARKS YET
Large, multimodal biometric dataset: It contains still images and videos of over 1,000 people captured at various ranges (up to 1,000 meters) and elevations (up to 400 meters) using a diverse set of cameras (commercial, military-grade, specialized).
4 PAPERS • 2 BENCHMARKS
The COVID-19 pandemic raises the problem of adapting face recognition systems to the new reality, where people may wear surgical masks to cover their noses and mouths. Traditional data sets (e.g., CelebA, CASIA-WebFace) used for training these systems were released before the pandemic, so they now seem unsuited due to the lack of examples of people wearing masks. We propose a method for enhancing data sets containing faces without masks by creating synthetic masks and overlaying them on faces in the original images. Our method relies on Spark AR Studio, a developer program made by Facebook that is used to create Instagram face filters. In our approach, we use 9 masks of different colors, shapes and fabrics. We employ our method to generate a number of 445,446 (90%) samples of masks for the CASIA-WebFace data set.
4 PAPERS • 1 BENCHMARK
The COVID-19 pandemic raises the problem of adapting face recognition systems to the new reality, where people may wear surgical masks to cover their noses and mouths. Traditional data sets (e.g., CelebA, CASIA-WebFace) used for training these systems were released before the pandemic, so they now seem unsuited due to the lack of examples of people wearing masks. We propose a method for enhancing data sets containing faces without masks by creating synthetic masks and overlaying them on faces in the original images. Our method relies on Spark AR Studio, a developer program made by Facebook that is used to create Instagram face filters. In our approach, we use 9 masks of different colors, shapes and fabrics. We employ our method to generate a number of 196,254 (96.8%) masks for the CelebA data set.
Paper Abstract
3 PAPERS • 4 BENCHMARKS
We have cleaned the noisy IMDB-WIKI dataset using a constrained clustering method, resulting this new benchmark for in-the-wild age estimation. The annotations also allow this dataset to use for some other tasks, like gender classification and face recognition/verification. For more details, please refer to our FPAge paper.
3 PAPERS • 1 BENCHMARK
Description: 1,995 People Face Images Data (Asian race). For each subject, more than 20 images per person with frontal face were collected. This data can be used for face recognition and other tasks.
2 PAPERS • NO BENCHMARKS YET
CASIA-Face-Africa is a face image database which contains 38,546 images of 1,183 African subjects. Multi-spectral cameras are utilized to capture the face images under various illumination settings. Demographic attributes and facial expressions of the subjects are also carefully recorded. For landmark detection, each face image in the database is manually labeled with 68 facial keypoints. A group of evaluation protocols are constructed according to different applications, tasks, partitions and scenarios. The proposed database along with its face landmark annotations, evaluation protocols and preliminary results form a good benchmark to study the essential aspects of face biometrics for African subjects, especially face image preprocessing, face feature analysis and matching, facial expression recognition, sex/age estimation, ethnic classification, face image generation, etc.
Celeb-HQ Face Gender Recognition Dataset
Celeb-HQ Facial Identity Recognition Dataset