Multi-Modal CelebA-HQ

Introduced by Xia et al. in TediGAN: Text-Guided Diverse Face Image Generation and Manipulation

Multi-Modal-CelebA-HQ is a large-scale face image dataset that has 30,000 high-resolution face images selected from the CelebA dataset by following CelebA-HQ. Each image has high-quality segmentation mask, sketch, descriptive text, and image with transparent background.

Multi-Modal-CelebA-HQ can be used to train and evaluate algorithms of text-to-image-generation, text-guided image manipulation, sketch-to-image generation, and GANs for face generation and editing.

Source: Multi-Modal CelebA-HQ Dataset

Homepage

Benchmarks

Add a new result Link an existing benchmark

Task	Dataset Variant	Best Model
Text-to-Image Generation	Multi-Modal-CelebA-HQ	Swinv2-Imagen
multimodal generation	Multi-Modal CelebA-HQ	Diffusion
Face Sketch Synthesis	Multi-Modal CelebA-HQ	Diffusion