Residual Multi-Layer Perceptrons, or ResMLP, is an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. At the end of the network, the patch representations are average pooled, and fed to a linear classifier.
Layer normalization is replaced with a simpler affine transformation, thanks to the absence of self-attention layers which makes training more stable. The affine operator is applied at the beginning ("pre-normalization") and end ("post-normalization") of each residual block. As a pre-normalization, Aff replaces LayerNorm without using channel-wise statistics. Initialization is achieved as $\mathbf{\alpha}=\mathbf{1}$, and $\mathbf{\beta}=\mathbf{0}$. As a post-normalization, Aff is similar to LayerScale and $\mathbf{\alpha}$ is initialized with the same small value.
Source: ResMLP: Feedforward networks for image classification with data-efficient trainingPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Image Classification | 4 | 21.05% |
Object Detection | 2 | 10.53% |
Semantic Segmentation | 2 | 10.53% |
Color Manipulation | 1 | 5.26% |
Image Enhancement | 1 | 5.26% |
Photo Retouching | 1 | 5.26% |
Tone Mapping | 1 | 5.26% |
Adversarial Attack | 1 | 5.26% |
Instance Segmentation | 1 | 5.26% |