Spatial Feature Transform

Introduced by Wang et al. in Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform

Spatial Feature Transform, or SFT, is a layer that generates affine transformation parameters for spatial-wise feature modulation, and was originally proposed within the context of image super-resolution. A Spatial Feature Transform (SFT) layer learns a mapping function $\mathcal{M}$ that outputs a modulation parameter pair $(\mathbf{\gamma}, \mathbf{\beta})$ based on some prior condition $\Psi$. The learned parameter pair adaptively influences the outputs by applying an affine transformation spatially to each intermediate feature maps in an SR network. During testing, only a single forward pass is needed to generate the HR image given the LR input and segmentation probability maps.

More precisely, the prior $\Psi$ is modeled by a pair of affine transformation parameters $(\mathbf{\gamma}, \mathbf{\beta})$ through a mapping function $\mathcal{M}: \Psi \mapsto(\mathbf{\gamma}, \mathbf{\beta})$. Consequently,

$$ \hat{\mathbf{y}}=G_{\mathbf{\theta}}(\mathbf{x} \mid \mathbf{\gamma}, \mathbf{\beta}), \quad(\mathbf{\gamma}, \mathbf{\beta})=\mathcal{M}(\Psi) $$

After obtaining $(\mathbf{\gamma}, \mathbf{\beta})$ from conditions, the transformation is carried out by scaling and shifting feature maps of a specific layer:

$$ \operatorname{SFT}(\mathbf{F} \mid \mathbf{\gamma}, \mathbf{\beta})=\mathbf{\gamma} \odot \mathbf{F}+\mathbf{\beta} $$

where $\mathbf{F}$ denotes the feature maps, whose dimension is the same as $\gamma$ and $\mathbf{\beta}$, and $\odot$ is referred to element-wise multiplication, i.e., Hadamard product. Since the spatial dimensions are preserved, the SFT layer not only performs feature-wise manipulation but also spatial-wise transformation.

Source: Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Face Recognition	2	12.50%
Blind Face Restoration	2	12.50%
Image Super-Resolution	2	12.50%
Super-Resolution	2	12.50%
Diffusion Personalization	1	6.25%
Diffusion Personalization Tuning Free	1	6.25%
Face Generation	1	6.25%
Face Verification	1	6.25%
Image Classification	1	6.25%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Image Model Blocks