Content-Conditioned Style Encoder Explained

Method Name:*

Method Full Name:*

Description with Markdown (optional):

The **Content-Conditioned Style Encoder**, or **COCO**, is a style encoder used for image-to-image translation in the [COCO-FUNIT](https://paperswithcode.com/method/coco-funit#) architecture.  Unlike the style encoder in [FUNIT](https://arxiv.org/abs/1905.01723), COCO takes both content and style image as input. With this content conditioning scheme, we create a direct feedback path during learning to let the content image influence how the style code is computed. It also helps reduce the direct influence of the style image to the extract style code.

The bottom part of the Figure details architecture. First, the content image is fed into an encoder $E\_{S, C}$ to compute a spatial feature map. This content feature map is then mean-pooled and mapped to a vector $\zeta\_{c} .$ Similarly, the style image is fed into encoder $E\_{S, S}$ to compute a spatial feature map. The style feature map is then mean-pooled and concatenated with an input-independent bias vector: the constant style bias (CSB). Note that while the regular bias in deep networks is added to the activations, in CSB, the bias is concatenated with the activations. The CSB provides a fixed input to the style encoder, which helps compute a style code that is less sensitive to the variations in the style image.

The concatenation of the style vector and the CSB is mapped to a vector $\zeta\_{s}$ via a fully connected layer. We then perform an element-wise product operation to $\zeta\_{c}$ and $\zeta\_{s}$, which is the final style code. The style code is then mapped to produce the [AdaIN](https://paperswithcode.com/method/adaptive-instance-normalization) parameters for generating the translation. Through this element-wise product operation, the resulting style code is heavily influenced by the content image. One way to look at this mechanism is that it produces a customized style code for the input content image.

The COCO is used as a drop-in replacement for the style encoder in FUNIT. Let $\phi$ denote the COCO mapping. The translation output is then computed via

$$
z\_{c}=E\_{c}\left(x_{c}\right), z_{s}=\phi\left(E\_{s, s}\left(x_{s}\right), E\_{s, c}\left(x\_{c}\right)\right), \overline{\mathbf{x}}=F\left(z\_{c}, z\_{s}\right)
$$

The style code extracted by the COCO is more robust to variations in the style image. Note that we set $E\_{S, C} \equiv E\_{C}$ to keep the number of parameters in our model similar to that in FUNIT.

Code Snippet URL (optional):

Image

Currently: methods/04713b57-1803-46ac-ac3d-5fcf574fe1dd.png Clear
Change:

Attached collections:

IMAGE MODEL BLOCKS

Add:

New collection name:

Top-level area:

Parent collection (if any):

Description (optional):

Task	Papers	Share
Image-to-Image Translation	1	33.33%
Translation	1	33.33%
Unsupervised Image-To-Image Translation	1	33.33%

Content-Conditioned Style Encoder

Papers

Tasks

Usage Over Time

Components

Categories

Add Remove