The ArtiFact dataset is a large-scale image dataset that aims to include a diverse collection of real and synthetic images from multiple categories, including Human/Human Faces, Animal/Animal Faces, Places, Vehicles, Art, and many other real-life objects. The dataset comprises 8 sources that were carefully chosen to ensure diversity and includes images synthesized from 25 distinct methods, including 13 GANs, 7 Diffusion, and 5 other miscellaneous generators. The dataset contains 2,496,738 images, comprising 964,989 real images and 1,531,749 fake images.
To ensure diversity across different sources, the real images of the dataset are randomly sampled from source datasets containing numerous categories, whereas synthetic images are generated within the same categories as the real images. Captions and image masks from the COCO dataset are utilized to generate images for text2image and inpainting generators, while normally distributed noise with different random seeds is used for noise2image generators. The dataset is further processed to reflect real-world scenarios by applying random cropping, downscaling, and JPEG compression, in accordance with the IEEE VIP Cup 2022 standards.
The ArtiFact dataset is intended to serve as a benchmark for evaluating the performance of synthetic image detectors under real-world conditions. It includes a broad spectrum of diversity in terms of generators used and syntheticity, providing a challenging dataset for image detection tasks.
Paper | Code | Results | Date | Stars |
---|