Feedforward Networks

The Affine Operator is an affine transformation layer introduced in the ResMLP architecture. This replaces layer normalization, as in Transformer based networks, which is possible since in the ResMLP, there are no self-attention layers which makes training more stable - hence allowing a more simple affine transformation.

The affine operator is defined as:

$$ \operatorname{Aff}_{\mathbf{\alpha}, \mathbf{\beta}}(\mathbf{x})=\operatorname{Diag}(\mathbf{\alpha}) \mathbf{x}+\mathbf{\beta} $$

where $\alpha$ and $\beta$ are learnable weight vectors. This operation only rescales and shifts the input element-wise. This operation has several advantages over other normalization operations: first, as opposed to Layer Normalization, it has no cost at inference time, since it can absorbed in the adjacent linear layer. Second, as opposed to BatchNorm and Layer Normalization, the Aff operator does not depend on batch statistics.

Source: ResMLP: Feedforward networks for image classification with data-efficient training

Papers


Paper Code Results Date Stars

Tasks


Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories