Spatial Feature Transform, or SFT, is a layer that generates affine transformation parameters for spatial-wise feature modulation, and was originally proposed within the context of image super-resolution. A Spatial Feature Transform (SFT) layer learns a mapping function $\mathcal{M}$ that outputs a modulation parameter pair $(\mathbf{\gamma}, \mathbf{\beta})$ based on some prior condition $\Psi$. The learned parameter pair adaptively influences the outputs by applying an affine transformation spatially to each intermediate feature maps in an SR network. During testing, only a single forward pass is needed to generate the HR image given the LR input and segmentation probability maps.
More precisely, the prior $\Psi$ is modeled by a pair of affine transformation parameters $(\mathbf{\gamma}, \mathbf{\beta})$ through a mapping function $\mathcal{M}: \Psi \mapsto(\mathbf{\gamma}, \mathbf{\beta})$. Consequently,
$$ \hat{\mathbf{y}}=G_{\mathbf{\theta}}(\mathbf{x} \mid \mathbf{\gamma}, \mathbf{\beta}), \quad(\mathbf{\gamma}, \mathbf{\beta})=\mathcal{M}(\Psi) $$
After obtaining $(\mathbf{\gamma}, \mathbf{\beta})$ from conditions, the transformation is carried out by scaling and shifting feature maps of a specific layer:
$$ \operatorname{SFT}(\mathbf{F} \mid \mathbf{\gamma}, \mathbf{\beta})=\mathbf{\gamma} \odot \mathbf{F}+\mathbf{\beta} $$
where $\mathbf{F}$ denotes the feature maps, whose dimension is the same as $\gamma$ and $\mathbf{\beta}$, and $\odot$ is referred to element-wise multiplication, i.e., Hadamard product. Since the spatial dimensions are preserved, the SFT layer not only performs feature-wise manipulation but also spatial-wise transformation.
Source: Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature TransformPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Face Recognition | 2 | 12.50% |
Blind Face Restoration | 2 | 12.50% |
Image Super-Resolution | 2 | 12.50% |
Super-Resolution | 2 | 12.50% |
Diffusion Personalization | 1 | 6.25% |
Diffusion Personalization Tuning Free | 1 | 6.25% |
Face Generation | 1 | 6.25% |
Face Verification | 1 | 6.25% |
Image Classification | 1 | 6.25% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |