The ADE20K semantic segmentation dataset contains more than 20K scene-centric images exhaustively annotated with pixel-level objects and object parts labels. There are totally 150 semantic categories, which include stuffs like sky, road, grass, and discrete objects like person, car, bed.
604 PAPERS • 16 BENCHMARKS
Manga109 has been compiled by the Aizawa Yamasaki Matsui Laboratory, Department of Information and Communication Engineering, the Graduate School of Information Science and Technology, the University of Tokyo. The compilation is intended for use in academic research on the media processing of Japanese manga. Manga109 is composed of 109 manga volumes drawn by professional manga artists in Japan. These manga were commercially made available to the public between the 1970s and 2010s, and encompass a wide range of target readerships and genres (see the table in Explore for further details.) Most of the manga in the compilation are available at the manga library “Manga Library Z” (formerly the “Zeppan Manga Toshokan” library of out-of-print manga).
180 PAPERS • 12 BENCHMARKS
The Face Detection Dataset and Benchmark (FDDB) dataset is a collection of labeled faces from Faces in the Wild dataset. It contains a total of 5171 face annotations, where images are also of various resolution, e.g. 363x450 and 229x410. The dataset incorporates a range of challenges, including difficult pose angles, out-of-focus faces and low resolution. Both greyscale and color images are included.
155 PAPERS • 1 BENCHMARK
AFW (Annotated Faces in the Wild) is a face detection dataset that contains 205 images with 468 faces. Each face image is labeled with at most 6 landmarks with visibility labels, as well as a bounding box.
147 PAPERS • 1 BENCHMARK
UMDFaces is a face dataset divided into two parts:
28 PAPERS • NO BENCHMARKS YET
The PASCAL FACE dataset is a dataset for face detection and face recognition. It has a total of 851 images which are a subset of the PASCAL VOC and has a total of 1,341 annotations. These datasets contain only a few hundreds of images and have limited variations in face appearance.
21 PAPERS • 1 BENCHMARK
WIDER is a dataset for complex event recognition from static images. As of v0.1, it contains 61 event categories and around 50574 images annotated with event class labels.
16 PAPERS • 1 BENCHMARK
COCO-WholeBody is an extension of COCO dataset with whole-body annotations. There are 4 types of bounding boxes (person box, face box, left-hand box, and right-hand box) and 133 keypoints (17 for body, 6 for feet, 68 for face and 42 for hands) annotations for each person in the image.
14 PAPERS • 6 BENCHMARKS
The MALF dataset is a large dataset with 5,250 images annotated with multiple facial attributes and it is specifically constructed for fine grained evaluation.
14 PAPERS • NO BENCHMARKS YET
Unconstrained Face Detection Dataset (UFDD) aims to fuel further research in unconstrained face detection.
13 PAPERS • NO BENCHMARKS YET
Proposes three types of masked face detection dataset; namely, the Correctly Masked Face Dataset (CMFD), the Incorrectly Masked Face Dataset (IMFD) and their combination for the global masked face detection (MaskedFace-Net).
11 PAPERS • NO BENCHMARKS YET
Real-World Masked Face Dataset (RMFD) is a large dataset for masked face detection.
9 PAPERS • NO BENCHMARKS YET
The iCartoonFace dataset is a large-scale dataset that can be used for two different tasks: cartoon face detection and cartoon face recognition.
6 PAPERS • 1 BENCHMARK
Contains video clips shot with modern high-resolution mobile cameras, with strong projective distortions and with low lighting conditions.
5 PAPERS • NO BENCHMARKS YET
The DCM dataset is composed of 772 annotated images from 27 golden age comic books. We freely collected them from the free public domain collection of digitized comic books Digital Comics Museum. One album per available publisher was selected to get as many different styles as possible. We made ground-truth bounding boxes of all panels, all characters (body + faces), small or big, human-like or animal-like.
4 PAPERS • 3 BENCHMARKS
A large-scale, hierarchical annotated dataset of animal faces, featuring 21.9K faces from 334 diverse species and 21 animal orders across biological taxonomy. These faces are captured `in-the-wild' conditions and are consistently annotated with 9 landmarks on key facial features. The proposed dataset is structured and scalable by design; its development underwent four systematic stages involving rigorous, manual annotation effort of over 6K man-hours.
3 PAPERS • NO BENCHMARKS YET
Multimodal Dyadic Behavior (MMDB) dataset is a unique collection of multimodal (video, audio, and physiological) recordings of the social and communicative behavior of toddlers. The MMDB contains 160 sessions of 3-5 minute semi-structured play interaction between a trained adult examiner and a child between the age of 15 and 30 months. The MMDB dataset supports a novel problem domain for activity recognition, which consists of the decoding of dyadic social interactions between adults and children in a developmental context.
BAFMD contains images posted on Twitter during the pandemic from around the world with more images from underrepresented race and age groups to mitigate the problem for the face mask detection task.
2 PAPERS • NO BENCHMARKS YET
CASIA-Face-Africa is a face image database which contains 38,546 images of 1,183 African subjects. Multi-spectral cameras are utilized to capture the face images under various illumination settings. Demographic attributes and facial expressions of the subjects are also carefully recorded. For landmark detection, each face image in the database is manually labeled with 68 facial keypoints. A group of evaluation protocols are constructed according to different applications, tasks, partitions and scenarios. The proposed database along with its face landmark annotations, evaluation protocols and preliminary results form a good benchmark to study the essential aspects of face biometrics for African subjects, especially face image preprocessing, face feature analysis and matching, facial expression recognition, sex/age estimation, ethnic classification, face image generation, etc.
A 360-degree fisheye-like version of the popular FDDB face detection dataset.
The Hochschule Darmstadt (HDA) facial tattoo and paintings database contains 500 pairs of facial images of individuals with and without facial tattoos or paintings. The database was collected from multiple online sources.
Description: 1,078 People 3D Faces Collection Data. The collection device is Realsense SR300. Each subject was collected once a week, 6 times in total, so the time span is 6 weeks. The number of videos collected for one subject is 16. The dataset can be used for tasks such as 3D face recognition.
1 PAPER • 1 BENCHMARK
FAD is a dataset that have roughly 200,000 attribute labels for the above traits, for over 10,000 facial images.
1 PAPER • NO BENCHMARKS YET
The Human-Parts dataset is a dataset for human body, face and hand detection with ~15k images. It contains ~106k different annotations, with multiple annotations per image.
Consists of a large number of unconstrained multi-view and partially occluded faces.
Dataset originally conceived for multi-face tracking/detection for highly crowded scenarios. In these scenarios, the face is the only part that can be used to track the individuals.
MobiFace is the first dataset for single face tracking in mobile situations. It consists of 80 unedited live-streaming mobile videos captured by 70 different smartphone users in fully unconstrained environments. Over 95K bounding boxes are manually labelled. The videos are carefully selected to cover typical smartphone usage. The videos are also annotated with 14 attributes, including 6 newly proposed attributes and 8 commonly seen in object tracking.
In the last two years, millions of lives have been lost due to COVID-19. Despite the vaccination programmes for a year, hospitalization rates and deaths are still high due to the new variants of COVID-19. Stringent guidelines and COVID-19 screening measures such as temperature check and mask check at all public places are helping reduce the spread of COVID-19. Visual inspections to ensure these screening measures can be taxing and erroneous. Automated inspection ensures an effective and accurate screening.
Procedural Human Action Videos contains a total of 39,982 videos, with more than 1,000 examples for each action of 35 categories.
In this paper, we introduce a victim dataset for the RoboCup Rescue competitions. The RoboCup Rescue robots have to collect points within several disciplines, e.g. a search task within an area to survey simulated baby doll (victim).
Description: 1,995 People Face Images Data (Asian race). For each subject, more than 20 images per person with frontal face were collected. This data can be used for face recognition and other tasks.
0 PAPER • NO BENCHMARKS YET
Description: 110 People – Human Face Image Data with Multiple Angles, Light Conditions, and Expressions. The subjects are all young people. For each subject, 2,100 images were collected. The 2,100 images includes 14 kinds of camera angles *5 kinds of light conditions * 30 kinds of expressions. The data can be used for face recognition, 3D face reconstruction, etc.
Description: 23 Pairs of Identical Twins Face Image Data. The collecting scenes includes indoor and outdoor scenes. The subjects are Chinese males and females. The data diversity inlcudes multiple face angles, multiple face postures, close-up of eyes, multiple light conditions and multiple age groups. This dataset can be used for tasks such as twins' face recognition.
Description: 399 Chinese People 35,112 Images Multi-pose Face Data with 21 Facial Landmarks Annotation, this data collected 399 people(88 images per person). The data diversity includes multiple poses, different ages, different light conditions and multiple scenes. This data can be used for tasks such as face detection and face recognition.
Description: 4,082 Families-Family Face Data. The data includes various scenee, different families and 11 kinds of kinship pairs. One family photo was collected for each family, each family includes three family members at least. 11 kinds of kinship pairs, key points of two pupils, and bounding box of face were annotated. The data can be used for tasks such as kinship verification, searching for missing family members and organizing family photo albums.
Description: 4,999 People 55,348 Images Infant Faces Collection Data. The data includes indoor and outdoor scenes, at least two backgrounds for each person. The dataset includes boys and girls (Chinese). The data diversity includes multiple scenes, multiple ages, multiple angles, multiple light conditions. This data can be used for tasks such as infants face recognition.
Description: 5,011 Images – Human Frontal face Data (Male). The data diversity includes multiple scenes, multiple ages and multiple races. This dataset includes 2,004 Caucasians , 3,007 Asians. This dataset can be used for tasks such as face detection, race detection, age detection, beard category classification.
Description: 500 People Face Image and Video Data (Asian race). For each subject, 12 videos and 30 images were collected. The data diversity includes face wearing glasses or not, looking up, looking down, front view, etc. This data can be used for tasks such as face recognition.
Description: 55 Videos –Juvenile Face Data, the subjects' facial poses are abundant and the facial expressions are natural. The dataset can be used for face detection, face recognition and other tasks.
Description: 64,378 Images Data of 1,073 Dogs' Noses. The data includes indoor and outdoor scenes(the collection scene of the same dog didn't change). The data covers multiple dog types (such as Teddy, Labrador, Shiba Inu, etc.), and multiple lights. Segmentation annotation was done on the dog's nose. The data can be applied to dog face recognition, dog identification, etc.
Description: 9,181 People 59,490 Images Cross-age Faces Data. The data includes indoor and outdoor scenes. The dataset includes female and male(Chinese). For most people, the age spans are 10 years at least, the age spans of only a few people are less than 10 years (128 people). For each person, at least 4 front side images were collected. The data can be used for tasks such as cross-age face recognition.
Facial landmark detection is a cornerstone in many facial analysis tasks such as face recognition, drowsiness detection, and facial expression recognition. Numerous methodologies were introduced to achieve accurate and efficient facial landmark localization in visual images. However, there are only several works that address facial landmark detection in thermal images. The main challenge is the limited number of annotated datasets. In this work, we present a thermal face dataset with annotated face bounding boxes and facial landmarks. The dataset contains 2,556 thermal images of 142 individuals, where each thermal image is paired with the corresponding visual image. To the best of our knowledge, our dataset is the largest in terms of the number of individuals. In addition, our dataset can be employed for tasks such as thermal-to-visual image translation, thermal-visual face recognition, and others. We trained two models for the facial landmark detection task to show the efficacy of our
Face detection and subsequent localization of facial landmarks are the primary steps in many face applications. Numerous algorithms and benchmark datasets have been introduced to develop robust models for the visible domain. However, varying conditions of illumination still pose challenging problems. In this regard, thermal cameras are employed to address this problem, because they operate on longer wavelengths. However, thermal face and facial landmark detection in the wild is an open research problem because most of the existing thermal datasets were collected in controlled environments. In addition, many of them were not annotated with face bounding boxes and facial landmarks. In this work, we present a thermal face dataset with manually labeled bounding boxes and facial landmarks to address these problems. The dataset contains 9,982 images of 147 subjects collected under controlled and uncontrolled conditions. As a baseline, we trained the YOLOv5 object detection model and its adap