At inference time, the model begins with generating all tokens of an image simultaneously, and then refines the image iteratively conditioned on the previous generation.
Ranked #2 on Text-to-Image Generation on LHQC
During inference, BLT first generates a draft layout from the input and then iteratively refines it into a high-quality layout by masking out low-confident attributes.
In this work, we present Pyramid Adversarial Training, a simple and effective technique to improve ViT's overall performance.
Ranked #3 on Domain Generalization on ImageNet-C (using extra training data)
We expect this standardized evaluation protocol to play a role in advancing image-to-image translation research.
Ranked #1 on Image Inpainting on Places2 val
We present SLIDE, a modular and unified system for single image 3D photography that uses a simple yet effective soft layering strategy to better preserve appearance details in novel views.
Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases.
Ranked #31 on Image Generation on CIFAR-10
Existing methods struggle to extrapolate images with salient objects in the foreground or are limited to very specific objects such as humans, but tend to work well on indoor/outdoor scenes.
Remarkable progress has been made in 3D reconstruction of rigid structures from a video or a collection of images.
Synthetic datasets play a critical role in pre-training CNN models for optical flow, but they are painstaking to generate and hard to adapt to new applications.
Although deep learning has enabled a huge leap forward in image inpainting, current methods are often unable to synthesize realistic high-frequency details.
Watermarking is the process of embedding information into an image that can survive under distortions, while requiring the encoded image to have little or no perceptual difference from the original image.
Garment transfer is a challenging task that requires (i) disentangling the features of the clothing from the body pose and shape and (ii) realistic synthesis of the garment texture on the new body.
Ranked #1 on Virtual Try-on on FashionIQ (using extra training data)
This paper introduces an automatic method for editing a portrait photo so that the subject appears to be wearing makeup in the style of another person in a reference photo.