Taming Transformers for High-Resolution Image Synthesis

17 Dec 2020 Patrick Esser Robin Rombach Björn Ommer

Designed to learn long-range interactions on sequential data, transformers continue to show state-of-the-art results on a wide variety of tasks. In contrast to CNNs, they contain no inductive bias that prioritizes local interactions... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Image-to-Image Translation ADE20K Labels-to-Photos Esser et al. FID 35.5 # 7
Image-to-Image Translation COCO-Stuff Labels-to-Photos Esser et al. FID 22.4 # 4

Methods used in the Paper


METHOD TYPE
Residual Connection
Skip Connections
Label Smoothing
Regularization
Dropout
Regularization
BPE
Subword Segmentation
Adam
Stochastic Optimization
Dense Connections
Feedforward Networks
Softmax
Output Functions
Multi-Head Attention
Attention Modules
Layer Normalization
Normalization
Scaled Dot-Product Attention
Attention Mechanisms
Transformer
Transformers