Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

In this research work we present CLIP-GLaSS, a novel zero-shot framework to generate an image (or a caption) corresponding to a given caption (or image). CLIP-GLaSS is based on the CLIP neural network, which, given an image and a descriptive caption, provides similar embeddings... (read more)

PDF Abstract

Tasks


Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Softmax
Output Functions
Dense Connections
Feedforward Networks
Adam
Stochastic Optimization
Linear Layer
Feedforward Networks
Dot-Product Attention
Attention Mechanisms
1x1 Convolution
Convolutions
Feedforward Network
Feedforward Networks
Off-Diagonal Orthogonal Regularization
Regularization
Projection Discriminator
Discriminators
SAGAN Self-Attention Module
Attention Modules
Truncation Trick
Latent Variable Sampling
Spectral Normalization
Normalization
Non-Local Operation
Image Feature Extractors
Early Stopping
Regularization
Conditional Batch Normalization
Normalization
Batch Normalization
Normalization
ReLU
Activation Functions
SAGAN
Generative Adversarial Networks
GAN Hinge Loss
Loss Functions
Residual Connection
Skip Connections
TTUR
Optimization
Convolution
Convolutions
Residual Block
Skip Connection Blocks
Non-Local Block
Image Model Blocks
BigGAN
Generative Models
Weight Demodulation
Normalization
R1 Regularization
Regularization
Path Length Regularization
Regularization
Leaky ReLU
Activation Functions
StyleGAN2
Generative Models