FaceForensics++ is a forensics dataset consisting of 1000 original video sequences that have been manipulated with four automated face manipulation methods: Deepfakes, Face2Face, FaceSwap and NeuralTextures. The data has been sourced from 977 youtube videos and all videos contain a trackable mostly frontal face without occlusions which enables automated tampering methods to generate realistic forgeries.
337 PAPERS • 2 BENCHMARKS
Celeb-DF is a large-scale challenging dataset for deepfake forensics. It includes 590 original videos collected from YouTube with subjects of different ages, ethnic groups and genders, and 5639 corresponding DeepFake videos.
165 PAPERS • NO BENCHMARKS YET
The DFDC (Deepfake Detection Challenge) is a dataset for deepface detection consisting of more than 100,000 videos.
113 PAPERS • 1 BENCHMARK
FaceForensics is a video dataset consisting of more than 500,000 frames containing faces from 1004 videos that can be used to study image or video forgeries. All videos are downloaded from Youtube and are cut down to short continuous clips that contain mostly frontal faces. This dataset has two versions:
85 PAPERS • 1 BENCHMARK
FakeAVCeleb is a novel Audio-Video Deepfake dataset that not only contains deepfake videos but respective synthesized cloned audios as well.
52 PAPERS • 2 BENCHMARKS
WildDeepfake is a dataset for real-world deepfakes detection which consists of 7,314 face sequences extracted from 707 deepfake videos that are collected completely from the internet. WildDeepfake is a small dataset that can be used, in addition to existing datasets, to develop more effective detectors against real-world deepfakes.
43 PAPERS • NO BENCHMARKS YET
DeeperForensics-1.0 represents the largest face forgery detection dataset by far, with 60,000 videos constituted by a total of 17.6 million frames, 10 times larger than existing datasets of the same kind. The full dataset includes 48,475 source videos and 11,000 manipulated videos. The source videos are collected on 100 paid and consented actors from 26 countries, and the manipulated videos are generated by a newly proposed many-to-many end-to-end face swapping method, DF-VAE. 7 types of real-world perturbations at 5 intensity levels are employed to ensure a larger scale and higher diversity. Image Source: https://github.com/EndlessSora/DeeperForensics-1.0
26 PAPERS • NO BENCHMARKS YET
WaveFake is a dataset for audio deepfake detection. The dataset consists of a large-scale dataset of over 100K generated audio clips.
25 PAPERS • NO BENCHMARKS YET
The Korean DeepFake Detection Dataset (KoDF) is a large-scale collection of synthesized and real videos focused on Korean subjects, used for the task of deepfake detection.
21 PAPERS • NO BENCHMARKS YET
Localized Audio Visual DeepFake Dataset (LAV-DF).
10 PAPERS • 2 BENCHMARKS
Forgery Diversity: DF40 comprises 40 distinct deepfake techniques (both representive and SOTA methods are included), facilitating the detection of nowadays' SOTA deepfakes and AIGCs. We provide 10 face-swapping methods, 13 face-reenactment methods, 12 entire face synthesis methods, and 5 face editing.
4 PAPERS • NO BENCHMARKS YET
The detection and localization of highly realistic deepfake audio-visual content are challenging even for the most advanced state-of-the-art methods. While most of the research efforts in this domain are focused on detecting high-quality deepfake images and videos, only a few works address the problem of the localization of small segments of audio-visual manipulations embedded in real videos. In this research, we emulate the process of such content generation and propose the AV-Deepfake1M dataset. The dataset contains content-driven (i) video manipulations, (ii) audio manipulations, and (iii) audio-visual manipulations for more than 2K subjects resulting in a total of more than 1M videos. The paper provides a thorough description of the proposed data generation pipeline accompanied by a rigorous analysis of the quality of the generated data. The comprehensive benchmark of the proposed dataset utilizing state-of-the-art deepfake detection and localization methods indicates a significant
3 PAPERS • NO BENCHMARKS YET
The quality of AI-generated images has rapidly increased, leading to concerns of authenticity and trustworthiness.
3 PAPERS • 2 BENCHMARKS
DeepFake MNIST+ is a deepfake facial animation dataset. The dataset is generated by a SOTA image animation generator. It includes 10,000 facial animation videos in ten different actions, which can spoof the recent liveness detectors.
We construct the ForgeryNet dataset, an extremely large face forgery dataset with unified annotations in image- and video-level data across four tasks: 1) Image Forgery Classification, including two-way (real / fake), three-way (real / fake with identity-replaced forgery approaches / fake with identity-remained forgery approaches), and n-way (real and 15 respective forgery approaches) classification. 2) Spatial Forgery Localization, which segments the manipulated area of fake images compared to their corresponding source real images. 3) Video Forgery Classification, which re-defines the video-level forgery classification with manipulated frames in random positions. This task is important because attackers in real world are free to manipulate any target frame. and 4) Temporal Forgery Localization, to localize the temporal segments which are manipulated. ForgeryNet is by far the largest publicly available deep face forgery dataset in terms of data-scale (2.9 million images, 221,247 video
We created a new dataset, named DFDM, with 6,450 Deepfake videos generated by different Autoencoder models. Specifically, five Autoencoder models with variations in encoder, decoder, intermediate layer, and input resolution, respectively, have been selected to generate Deepfakes based on the same input. We have observed the visible but subtle visual differences among different Deepfakes, demonstrating the evidence of model attribution artifacts.
2 PAPERS • NO BENCHMARKS YET
DeePhy is a novel DeepFake Phylogeny dataset consisting of 5040 DeepFake videos generated using three different generation techniques. It is one of the first datasets which incorporates the concept of Deepfake Phylogeny which refers to the idea of generation of DeepFakes using multiple generation techniques in a sequential manner.
Human face Deepfake dataset sampled from large datasets
DEEP-VOICE: Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion This dataset contains examples of real human speech, and DeepFake versions of those speeches by using Retrieval-based Voice Conversion.
1 PAPER • 1 BENCHMARK
We release the dataset for non-commercial research. Submit requests <a href="https://forms.gle/6WPEGNWbYoEe6bte8" target="_blank">here</a>.
1 PAPER • NO BENCHMARKS YET
SONICS is a large-scale dataset comprising 97,164 songs — 48,090 real songs from YouTube and 49,074 fake songs from Suno & Udio — designed for synthetic song detection (SSD), also known as fake song detection (FSD). It addresses several limitations of existing datasets, such as the lack of end-to-end fake songs, limited diversity in music-lyrics, and insufficient long-duration songs. The average length of the songs in SONICS is 176 seconds, which enables the capture of long-context relationships. Moreover, SONICS provides open access to generated fake songs and is divided into 66,709 songs for training, 26,015 songs for testing, and 4,440 songs for validation. Additionally, the inclusion of song lyrics in SONICS dataset paves the way for future research in this field.
The DeepSpeak dataset contains over 43 hours of real and deepfake footage of people talking and gesturing in front of their webcams. The source data was collected from a diverse set of participants in their natural environments and the deepfakes were generated using state-of-the-art open-source lip-sync and face-swap software.
0 PAPER • NO BENCHMARKS YET