gMLP Explained | Papers With Code

Method Name:*

Method Full Name:*

Description with Markdown (optional):

**gMLP** is an [MLP](https://paperswithcode.com/methods/category/feedforward-networks)-based alternative to [Transformers](https://paperswithcode.com/methods/category/vision-transformer) without [self-attention](https://paperswithcode.com/method/scaled), which simply consists of channel projections and spatial projections with static parameterization. It is built out of basic MLP layers with gating. The model consists of a stack of $L$ blocks with identical size and structure. Let $X \in \mathbb{R}^{n \times d}$ be the token representations with sequence length $n$ and dimension $d$. Each block is defined as:

$$
Z=\sigma(X U), \quad \tilde{Z}=s(Z), \quad Y=\tilde{Z} V
$$

where $\sigma$ is an activation function such as [GeLU](https://paperswithcode.com/method/gelu). $U$ and $V$ define linear projections along the channel dimension - the same as those in the FFNs of Transformers (e.g., their shapes are $768 \times 3072$ and $3072 \times 768$ for $\text{BERT}_{\text {base }}$).

A key ingredient is $s(\cdot)$, a layer which captures spatial interactions. When $s$ is an identity mapping, the above transformation degenerates to a regular FFN, where individual tokens are processed independently without any cross-token communication. One of the major focuses is therefore to design a good $s$ capable of capturing complex spatial interactions across tokens. This leads to the use of a [Spatial Gating Unit](https://www.paperswithcode.com/method/spatial-gating-unit) which involves a modified linear gating.

The overall block layout is inspired by [inverted bottlenecks](https://paperswithcode.com/method/inverted-residual-block), which define $s(\cdot)$ as a [spatial depthwise convolution](https://paperswithcode.com/method/depthwise-separable-convolution). Note, unlike Transformers, gMLP does not require position embeddings because such information will be captured in $s(\cdot)$.

Code Snippet URL (optional):

Image

Currently: methods/641e1c00-a87b-40ce-a0ab-af50ac6aa318.png Clear
Change:

Attached collections:

IMAGE MODELS

Add:

New collection name:

Top-level area:

Parent collection (if any):

Description (optional):

Task	Papers	Share
Image Classification	3	13.04%
Instance Segmentation	2	8.70%
Object Detection	2	8.70%
Semantic Segmentation	2	8.70%
Question Answering	2	8.70%
Graph Representation Learning	1	4.35%
Node Classification	1	4.35%
Classification	1	4.35%
Multi-Label Classification	1	4.35%

Component	Type	Add Remove
GELU	Activation Functions
Layer Normalization	Normalization
Residual Connection	Skip Connections
Spatial Gating Unit	Feedforward Networks

gMLP

Papers

Tasks

Usage Over Time

Components

Categories

Add Remove