MLP-Mixer

Introduced by Tolstikhin et al. in MLP-Mixer: An all-MLP Architecture for Vision

The MLP-Mixer architecture (or “Mixer” for short) is an image architecture that doesn't use convolutions or self-attention. Instead, Mixer’s architecture is based entirely on multi-layer perceptrons (MLPs) that are repeatedly applied across either spatial locations or feature channels. Mixer relies only on basic matrix multiplication routines, changes to data layout (reshapes and transpositions), and scalar nonlinearities.

It accepts a sequence of linearly projected image patches (also referred to as tokens) shaped as a “patches × channels” table as an input, and maintains this dimensionality. Mixer makes use of two types of MLP layers: channel-mixing MLPs and token-mixing MLPs. The channel-mixing MLPs allow communication between different channels; they operate on each token independently and take individual rows of the table as inputs. The token-mixing MLPs allow communication between different spatial locations (tokens); they operate on each channel independently and take individual columns of the table as inputs. These two types of layers are interleaved to enable interaction of both input dimensions.

Source: MLP-Mixer: An all-MLP Architecture for Vision

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Image Classification	20	16.53%
Object Detection	7	5.79%
Semantic Segmentation	6	4.96%
Classification	5	4.13%
Image-to-Image Translation	3	2.48%
Translation	3	2.48%
Anomaly Detection	2	1.65%
Time Series Analysis	2	1.65%
Multiple Instance Learning	2	1.65%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Dense Connections	Feedforward Networks
Dropout	Regularization
GELU	Activation Functions
Global Average Pooling	Pooling Operations
Layer Normalization	Normalization
Residual Connection	Skip Connections

Categories

Add Remove

Image Models