no code implementations • 8 Nov 2023 • Prasad Gabbur
For example on ImageNet 256x256, using 10 sampling steps, we achieve a FID of 6. 94 and IS of 207. 85 with a GMM kernel compared to 10. 15 and 196. 73 respectively with a Gaussian kernel.
1 code implementation • NeurIPS 2021 • Prasad Gabbur, Manjot Bilkhu, Javier Movellan
We provide a probabilistic interpretation of attention and show that the standard dot-product attention in transformers is a special case of Maximum A Posteriori (MAP) inference.