Image Models

Residual Multi-Layer Perceptrons

Introduced by Touvron et al. in ResMLP: Feedforward networks for image classification with data-efficient training

Residual Multi-Layer Perceptrons, or ResMLP, is an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. At the end of the network, the patch representations are average pooled, and fed to a linear classifier.

Layer normalization is replaced with a simpler affine transformation, thanks to the absence of self-attention layers which makes training more stable. The affine operator is applied at the beginning ("pre-normalization") and end ("post-normalization") of each residual block. As a pre-normalization, Aff replaces LayerNorm without using channel-wise statistics. Initialization is achieved as $\mathbf{\alpha}=\mathbf{1}$, and $\mathbf{\beta}=\mathbf{0}$. As a post-normalization, Aff is similar to LayerScale and $\mathbf{\alpha}$ is initialized with the same small value.

Source: ResMLP: Feedforward networks for image classification with data-efficient training

Papers


Paper Code Results Date Stars

Categories