FaceForensics is a video dataset consisting of more than 500,000 frames containing faces from 1004 videos that can be used to study image or video forgeries. All videos are downloaded from Youtube and are cut down to short continuous clips that contain mostly frontal faces. Selfreenactment: where the authors use Face2Face to reenact the facial expressions of videos with their own facial expressions as input to get pairs of videos, which e.g. can be used to train supervised
82 PAPERS • 1 BENCHMARK
FaceForensics++ is a forensics dataset consisting of 1000 original video sequences that have been manipulated with four automated face manipulation methods: Deepfakes, Face2Face, FaceSwap and NeuralTextures The data has been sourced from 977 youtube videos and all videos contain a trackable mostly frontal face without occlusions which enables automated tampering methods to generate realistic forgeries.
315 PAPERS • 2 BENCHMARKS
MobiFace is the first dataset for single face tracking in mobile situations.
1 PAPER • NO BENCHMARKS YET
Toronto NeuroFace Dataset: A New Dataset for Facial Motion Analysis in Individuals with Neurological Disorders Toronto NeuroFace Dataset is a public dataset with videos of oro-facial gestures performed
0 PAPER • NO BENCHMARKS YET
Dataset originally conceived for multi-face tracking/detection for highly crowded scenarios. In these scenarios, the face is the only part that can be used to track the individuals. All our videos present novel crowd scenes recorded at near-eye level, where faces are visible enough to be analysed at the microscopic level, while also benefiting from a macroscopic view of the crowd. It includes: Face detections of 715 unique subjects along with instructions to download the synchronized video. More than 75k face detections annotated. Our dataset may be useful for: Face tracking, especially relevant for crowded scenarios (typically from video-surveillance cameras). Heavily occluded body tracking (in many videos, only the face is mostly visible). Face recognition. Face detection for partially occluded faces.
The IJB-C dataset is a video-based face recognition dataset. It is an extension of the IJB-A dataset with about 138,000 face images, 11,000 face videos, and 10,000 non-face images.
214 PAPERS • 3 BENCHMARKS
The proposed Extended-YouTube Faces (E-YTF) is an extension of the famous YouTube Faces (YTF) dataset and is specifically designed to further push the challenges of face recognition by addressing the problem of open-set face identification from heterogeneous data i.e. still images vs video.
The IJB-B dataset is a template-based face dataset that contains 1845 subjects with 11,754 images, 55,025 frames and 7,011 videos where a template consists of a varying number of still images and video In addition, the dataset comes with protocols for 1-to-1 template-based face verification, 1-to-N template-based open-set face identification, and 1-to-N open-set video face identification.
143 PAPERS • 5 BENCHMARKS
VPCD contains multi-modal annotations (face, body and voice) for all primary and secondary characters from a range of diverse TV-shows and movies. It contains body-tracks for each annotated character, face-tracks when visible, and voice-tracks when speaking, with their associated features. It consists of more than 30,000 face and body tracks of 300+ characters, from over 23 hours of video.
7 PAPERS • 1 BENCHMARK
VideoForensicsHQ is a benchmark dataset for face video forgery detection, providing high quality visual manipulations. It is one of the first face video manipulation benchmark sets that also contains audio and thus complements existing datasets along a new challenging dimension. VideoForensicsHQ contains 1,737 videos of speaking faces (44% male, 56% female), with 8 different emotions, most of them of “HD” resolution. The videos amount to 1,666,816 frames.
2 PAPERS • NO BENCHMARKS YET
UDIVA is a new non-acted dataset of face-to-face dyadic interactions, where interlocutors perform competitive and collaborative tasks with different behavior elicitation and cognitive workload.
6 PAPERS • NO BENCHMARKS YET
…The fake videos are created using 6 different methods: FaceSwap, DeepFaceLab, FSGAN, FOMM, ATFHP and Wav2Lip.
17 PAPERS • NO BENCHMARKS YET
…The dataset is audio-visual, so is also useful for a number of other applications, for example – visual speech synthesis, speech separation, cross-modal transfer from face to voice or vice versa and training face recognition from video to complement existing face recognition datasets.
495 PAPERS • 5 BENCHMARKS
…The segments are of varying length, between 3 and 10 seconds long, and in each clip the only visible face in the video and audible sound in the soundtrack belong to a single speaking person. In total, the dataset contains roughly 4700 hours of video segments with approximately 150,000 distinct speakers, spanning a wide variety of people, languages and face poses.
35 PAPERS • NO BENCHMARKS YET
4DFAB is a large scale database of dynamic high-resolution 3D faces which consists of recordings of 180 subjects captured in four different sessions spanning over a five-year period (2012 - 2017), resulting The database can be used for both face and facial expression recognition, as well as behavioural biometrics. It can also be used to learn very powerful blendshapes for parametrising facial behaviour.
14 PAPERS • NO BENCHMARKS YET
The MFA (Many Faces of Anger) dataset includes 200 in-the-wild videos from North American and Persian cultures with fine-grained labels of: 'annoyed', 'anger', 'disgust', 'hatred' and 'furious' and 13
1 PAPER • 1 BENCHMARK
The Oulu-NPU face presentation attack detection database consists of 4950 real access and attack videos. The 2D face artefacts were created using two printers and two display devices. The videos of the 55 subjects are divided into three subject-disjoint subsets for training, development and testing. Four test protocols are used to evaluate the generalization capability of face PAD methods across three covariates: unknown environmental conditions (namely illumination and background scene), acquisition
5 PAPERS • 1 BENCHMARK
DeeperForensics-1.0 represents the largest face forgery detection dataset by far, with 60,000 videos constituted by a total of 17.6 million frames, 10 times larger than existing datasets of the same kind The source videos are collected on 100 paid and consented actors from 26 countries, and the manipulated videos are generated by a newly proposed many-to-many end-to-end face swapping method, DF-VAE. 7
21 PAPERS • NO BENCHMARKS YET
CelebV-Text comprises 70,000 in-the-wild face video clips with diverse visual content, each paired with 20 texts generated using the proposed semi-automatic text generation strategy.
4 PAPERS • NO BENCHMARKS YET
We construct the ForgeryNet dataset, an extremely large face forgery dataset with unified annotations in image- and video-level data across four tasks: 1) Image Forgery Classification, including two-way ForgeryNet is by far the largest publicly available deep face forgery dataset in terms of data-scale (2.9 million images, 221,247 videos), manipulations (7 image-level approaches, 8 video-level approaches We perform extensive benchmarking and studies of existing face forensics methods and obtain several valuable observations.
3 PAPERS • 2 BENCHMARKS
…The dataset contains both target task labels (action) and selected privacy attributes (skin color, face, gender, nudity, and relationship) annotated on a per-frame basis.
7 PAPERS • NO BENCHMARKS YET
…The data was recorded from a CCTV camera in Oxford for research and development into activity and face recognition.
Consists of the key scenes from over 3K movies: each key scene is accompanied by a high level semantic description of the scene, character face-tracks, and metadata about the movie.
11 PAPERS • NO BENCHMARKS YET
WildDeepfake is a dataset for real-world deepfakes detection which consists of 7,314 face sequences extracted from 707 deepfake videos that are collected completely from the internet.
34 PAPERS • NO BENCHMARKS YET
…It includes face tracks from over 400 hours of TED and TEDx videos, along with the corresponding subtitles and word alignment boundaries.
48 PAPERS • 6 BENCHMARKS
…The images inside each zip file are face-only. We provide three convenient sizes: 224 x 224, 512 x 512, and 1024 x 1024 pixels. The participant covers their eyes with a hand, followed by covering the left half, the right half, and finally, the lower half of the face. The participant moves a green cloth in front of their face The participant puts on a face mask and counts from 1 to 10 out loud. Then, they remove the facemask. FSGAN (Face Swapping Generative Adversarial Network): This corresponds to the second version of FSGAN. Access its release at https://github.com/wyhsirius/LIA As a rule of thumb, An imposter outer face and target is the inner face, in case of faceswaps.
…The dataset contains the most complex content for the restoration task: faces, text, QR-codes, car numbers, unpatterned textures, small details.
25 PAPERS • 1 BENCHMARK
The Replay-Attack Database for face spoofing consists of 1300 video clips of photo and video attack attempts to 50 clients, under different lighting conditions.
124 PAPERS • 1 BENCHMARK
CASIA-MFSD is a dataset for face anti-spoofing. It contains 50 subjects, and 12 videos for each subject under different resolutions and light conditions.
56 PAPERS • 1 BENCHMARK
…The dataset was collected using one 360x360 mechanical spinning LiDAR, one forward-facing, long-range LiDRAR, and 6 cameras.
37 PAPERS • NO BENCHMARKS YET
The Replay-Mobile Database for face spoofing consists of 1190 video clips of photo and video attack attempts to 40 clients, under different lighting conditions.
18 PAPERS • NO BENCHMARKS YET
The dataset contains the annotations of characters' visual appearances, in the form of tracks of face bounding boxes, and the associations with characters' textual mentions, when available.
3 PAPERS • 1 BENCHMARK
…AVA ActiveSpeaker: associates speaking activity with a visible face, on the AVA v1.0 videos, resulting in 3.65 million frames labeled across ~39K face tracks.
94 PAPERS • 7 BENCHMARKS
…It can be used for diverse research fields like visual speech recognition, face detection, and biometrics.
…contains AR glasses egocentric multi-channel microphone array audio, wide field-of-view RGB video, speech source pose, headset microphone audio, annotated voice activity, speech transcriptions, head and face
15 PAPERS • 4 BENCHMARKS
VIPL-HR database is a database for remote heart rate (HR) estimation from face videos under less-constrained situations.
15 PAPERS • 1 BENCHMARK
…The WMCA database is produced at Idiap within the framework of “IARPA BATL” and “H2020 TESLA” projects and it is intended for investigation of presentation attack detection (PAD) methods for face recognition
27 PAPERS • 1 BENCHMARK
…Camera-face distance is about 60 cm. Subjects were asked to make a facial expression according to an expression example shown in picture sequences.
78 PAPERS • 4 BENCHMARKS
…Each flight includes sensor data from 120Hz stereo and downward-facing photorealistic virtual cameras, 100Hz IMU, motor speed sensors, and 360Hz millimeter-accurate motion capture ground truth.
9 PAPERS • NO BENCHMARKS YET
…(Data collection was in accordance with IRB protocols and subject faces have been blurred for subject privacy.)
5 PAPERS • NO BENCHMARKS YET
…The pain stimulation experiment was conducted twice: once with un-occluded face and once with facial EMG sensors.
…For privacy-preserving reasons, the face of the subjects is blurred.
10 PAPERS • 1 BENCHMARK
The German Lipreading dataset consists of 250,000 publicly available videos of the faces of speakers of the Hessian Parliament, which was processed for word-level lip reading using an automatic pipeline
The MSU-MFSD dataset contains 280 video recordings of genuine and attack faces. 35 individuals have participated in the development of this database with a total of 280 videos.
83 PAPERS • 1 BENCHMARK
…Achard, "The DAily Home LIfe Activity Dataset: A High Semantic Activity Dataset for Online Recognition," 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), 2017,
…existing public datasets, iMiGUE focuses on nonverbal body gestures without using any identity information, while the predominant researches of emotion analysis concern sensitive biometric data, like face
…The ground truth consists of annotations of human faces and whether they are masks or not.
…Grooming behavior in mice contains a variety of visually diverse syntaxes including but not limited to paw licking, face washing, and flank licking.