Weight Standardization

Introduced by Qiao et al. in Micro-Batch Training with Batch-Channel Normalization and Weight Standardization

Weight Standardization is a normalization technique that smooths the loss landscape by standardizing the weights in convolutional layers. Different from the previous normalization methods that focus on activations, WS considers the smoothing effects of weights more than just length-direction decoupling. Theoretically, WS reduces the Lipschitz constants of the loss and the gradients. Hence, WS smooths the loss landscape and improves training.

In Weight Standardization, instead of directly optimizing the loss $\mathcal{L}$ on the original weights $\hat{W}$, we reparameterize the weights $\hat{W}$ as a function of $W$, i.e. $\hat{W}=\text{WS}(W)$, and optimize the loss $\mathcal{L}$ on $W$ by SGD:

$$ \hat{W} = \Big[ \hat{W}_{i,j}~\big|~ \hat{W}_{i,j} = \dfrac{W_{i,j} - \mu_{W_{i,\cdot}}}{\sigma_{W_{i,\cdot}+\epsilon}}\Big] $$

$$ y = \hat{W}*x $$

where

$$ \mu_{W_{i,\cdot}} = \dfrac{1}{I}\sum_{j=1}^{I}W_{i, j},~~\sigma_{W_{i,\cdot}}=\sqrt{\dfrac{1}{I}\sum_{i=1}^I(W_{i,j} - \mu_{W_{i,\cdot}})^2} $$

Similar to Batch Normalization, WS controls the first and second moments of the weights of each output channel individually in convolutional layers. Note that many initialization methods also initialize the weights in some similar ways. Different from those methods, WS standardizes the weights in a differentiable way which aims to normalize gradients during back-propagation. Note that we do not have any affine transformation on $\hat{W}$. This is because we assume that normalization layers such as BN or GN will normalize this convolutional layer again.

Source: Micro-Batch Training with Batch-Channel Normalization and Weight Standardization

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Image Classification	4	20.00%
Few-Shot Learning	2	10.00%
Fine-Grained Image Classification	2	10.00%
Active Learning	1	5.00%
Domain Adaptation	1	5.00%
Unsupervised Domain Adaptation	1	5.00%
Language Modelling	1	5.00%
Quantization	1	5.00%
Depth Estimation	1	5.00%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Normalization