Compact Convolutional Transformers utilize sequence pooling and replace the patch embedding with a convolutional embedding, allowing for better inductive bias and making positional embeddings optional. CCT achieves better accuracy than ViT-Lite (smaller ViTs) and increases the flexibility of the input parameters.
Source: Escaping the Big Data Paradigm with Compact TransformersPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Image Classification | 6 | 11.76% |
Classification | 2 | 3.92% |
Retrieval | 2 | 3.92% |
Text Generation | 2 | 3.92% |
Decision Making | 2 | 3.92% |
Decoder | 2 | 3.92% |
Semantic Segmentation | 2 | 3.92% |
Facial Expression Recognition (FER) | 2 | 3.92% |
Memorization | 2 | 3.92% |