[CVPR 2021] ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows

Jie An1∗ Siyu Huang2∗ Yibing Song3 Dejing Dou2 Wei Liu4 Jiebo Luo1

1 University of Rochester 2 Baidu Research

3 Tencent AI Lab 4 Tencent Data Platform

本文旨在解决图像风格迁移过程中产生的内容泄漏 (content leak) 问题，如下图所示。

Figure 1. Content leak visualization. Existing style transfer methods are not effective to preserve image content after several rounds of stylization process as shown in (d), although their performance is state-of-the-art in the first round as shown in (c).

图1：内容泄露可视化。现有的风格迁移方法在经过 (d) 所示的几轮风格化过程后，不能有效地保存图像内容，尽管它们在第一轮的表现是最先进的，如 (c) 所示。

[CVPR 2021] ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows

3.1. What is Content Leak?

3.2. Why Does Content Leak Happen?

4. Method

4.1. Overview of the ArtFlow Framework

4.3. Unbiased Content-Style Separation

Abstract

Universal style transfer retains styles from reference images in content images. While existing methods have achieved state-of-the-art style transfer performance, they are not aware of the content leak phenomenon that the image content may corrupt after several rounds of stylization process.

In this paper, we propose ArtFlow to prevent content leak during universal style transfer. ArtFlow consists of reversible neural flows and an unbiased feature transfer module. It supports both forward and backward inferences and operates in a projection-transfer-reversion scheme. The forward inference projects input images into deep features, while the backward inference remaps deep features back to input images in a lossless and unbiased way.

Extensive experiments demonstrate that ArtFlow achieves comparable performance to state-of-the-art style transfer methods while avoiding content leak.

通用风格迁移保留了内容图像中的参考图像的样式。现有的方法虽然已经达到了最先进的风格迁移性能，但却没有意识到经过几轮风格化过程后图像内容可能会腐化的内容泄漏 (content leak) 现象。

本文提出 ArtFlow 来防止通用风格迁移过程中的内容泄漏。ArtFlow 由可逆的神经流和无偏特征转移模块组成。它支持正向和反向推理，并在映射-转移-回归方案中操作。前向推理将输入图像投影到深度特征中，而后向推理则以无损、无偏的方式将深度特征映射回输入图像。

大量的实验证明，ArtFlow 在避免内容泄漏的同时，实现了与最先进风格的传输方法相当的性能。

1. Introduction

Neural style transfer aims at transferring the artistic style from a reference image to a content image.

Starting from [11, 13], numerous works based on iterative optimization [12, 44, 30, 34] and feed-forward networks [23, 53, 3, 63] improve style transfer from either visual quality or computational efficiency. Despite tremendous efforts, these methods do not generalize well for multiple types of style transfer.

Universal style transfer (UST) is proposed to improve this generalization ability. The representative UST methods include AdaIN [20], WCT [32], and Avatar-Net [45]. These methods are continuously extended by [15, 22, 60, 1, 45, 33, 40, 31, 2, 56]. While achieving favorable results as well as generalizations, these methods are limited to disentangling and reconstructing image content during the stylization process.

Fig. 1 shows some examples. Existing methods [32, 20, 45] effectively stylize content images in (c). However, image contents are corrupted after several rounds of stylization process where we send the reference image and the output result into these methods. We define this phenomenon as content leak and provide an analysis in the following:

引出问题：本段引出了作者发现风格迁移工作存在的一个内容泄漏 (content leak) 问题。

神经风格转移的目的是将艺术风格从参考图像转移到内容图像。

从 [11,13] 开始，基于迭代优化和前馈网络的大量工作从视觉质量和计算效率两方面改进了风格转移。尽管付出了巨大的努力，但这些方法并不能很好地推广多种类型的风格迁移。

为了提高这种泛化能力，提出了通用风格转移 (Universal style transfer, UST) 方法。代表性的UST方法有 AdaIN[20]、WCT[32] 和 Avatar-Net[45]。这些方法被[ 15,22,60,1,45,33,40,31,2,56] 不断扩展。这些方法在取得良好的效果和概括的同时，还局限于在风格化过程中对图像内容的解耦和重构。

图 1 给出了一些示例。现有方法 [32,20,45] 对 (c) 中的内容图像进行了有效的风格化处理。然而，将参考图像和输出结果发送到这些方法中，经过几轮风格化处理后，图像内容会被破坏。作者将这种现象定义为内容泄漏 (content leak)，并在下面提供分析。

Content leak appears due to the design of UST methods that usually consist of three parts: the first part is a fixed encoder for image embedding, the second part is a learnable decoder to remap deep features back to images, and the third part is a style transfer module based on deep features.

We observe that the first part is fixed. The appearance of content leak indicates the accumulated image reconstruction errors brought by the decoder, or the biased training process of either the decoder or the style transfer module.

Specifically, the content leaks of WCT [32] and its variants [31, 40, 56] is mainly caused by the image reconstruction error of the decoder.

The content leak of AdaIN series [20, 22, 60] and Avatar-Net [45] are additionally caused by the biased decoder training and a biased style transfer module, respectively. Sec. 3 shows more analyses.

问题原因：要解决内容泄漏 (content leak) 问题，就要弄清楚问题产生的原因。

内容泄漏出现由于 UST 的设计方法通常由三部分组成：第一部分是一个固定的图像 embedding 编码器，第二部分是可学的解码器，重新映射深特征到图像，第三部分是一个基于深度特性的风格传输模块。

作者观察到：第一部分是固定的（意思是说，这个模块不会导致内容泄露问题），那么，内容泄漏的出现说明，解码器带来的图像重建累积误差，或者解码器或风格传递模块的训练过程存在偏差。

具体来说，WCT 及其变体的内容泄露主要是由于解码器的图像重建错误造成的。

AdaIN 系列和 Avatar-Net 的内容泄露分别是由于有偏差的解码器训练和有偏差的风格传输模块造成的。

第 3 节将给出了更多的分析。

In this work, we propose an unbiased style transfer framework called ArtFlow to robustify exisiting UST methods upon overcoming content leak. Different from the prevalent encoder-transfer-decoder structure, ArtFlow introduces both forward and backward inferences to formulate a projection-transfer-reversion pipeline. This pipeline is based on neural flows [5] and only contains a Projection Flow Network (PFN) in conjunction with an unbiased feature transfer module.

The neural flow refers to a number of deep generative models [5, 18] which estimate density through a series of reversible transformations.

Our PFN follows the neural flow model GLOW [28] which consists of a chain of revertible operators including activation normalization layers, invertible 1 × 1 convolutions, and affine coupling layers [6].

Fig. 2 shows the structure of ArtFlow. It first projects both the content and style images into latent representations via forward inference. Then, it makes unbiased style transfer upon deep features and reconstructs the stylized images via reversed feature inference.

解决方法：本段介绍了 ArtFlow 的基本构成。

本文提出了一个名为 ArtFlow 的无偏风格迁移框架，以在现有的 UST 方法上，克服内容泄漏问题（对内容泄露更鲁棒）。与流行的编码器-转移-解码器结构不同，ArtFlow 同时引入了正向和反向推理，以形成映射-转移-逆转 pipeline。该 pipeline 基于神经流，只包含一个映射流网络 (PFN) 和一个无偏特征传输模块。

神经流指的是一些深度生成模型，这些模型通过一系列可逆的迁移来估计密度。

PFN 遵循神经流模型 GLOW，它由一系列可逆算子组成，包括激活归一化层、可逆 1x1 卷积和仿射耦合层。

图 2 显示了 ArtFlow 的结构。它首先通过正向推理将内容和样式图像投影到潜在表示中。然后，对深度特征进行无偏风格迁移，并通过反向特征推理重构程式化图像。

The proposed PFN avoids the image reconstruction error and image recovery bias which usually appear in the encoder-decoder framework. PFN allows unbiased and lossless feature extraction and image recovery. To this end, PFN facilitates the comparison of style transfer modules in a fair manner.

Based on PFN, we perform theoretical and empirical analyses of the inherent biases of style transfer modules adopted by WCT, AdaIN, and Avatar-Net. We show that the transfer modules of AdaIN and WCT are unbiased, while the transfer module of Avatar-Net is biased towards style. Consequently, we adopt the transfer modules of AdaIN and WCT as the transfer modules for ArtFlow to achieve an unbiased style transfer.

方法优势：分析了 PFN 和迁移模块的优点。

PFN 算法避免了编码器-解码器框架中经常出现的图像重建误差和图像恢复偏差。PFN 允许无偏和无损的特征提取和图像恢复。为此，PFN 使得风格迁移模块的比较更加公平。

基于 PFN，对WCT、AdaIN 和 Avatar-Net 所采用的风格转移模块的内在偏差进行了理论和实证分析。作者发现 AdaIN 和 WCT 的迁移模块是无偏的，而 Avatar-Net 的迁移模块偏向风格。因此，采用 AdaIN 和 WCT 的迁移模块作为 ArtFlow 的迁移模块，实现风格的无偏迁移。

The contributions of this work are three-fold:

• We reveal the Content Leak issue of the state-of-theart style transfer algorithms and identify the three main causes of the Content Leak in AdaIN [20], WCT [32], and Avatar-Net [45].

• We propose an unbiased, lossless, and reversible network named PFN based on neural flows, which allows both theoretical and empirical analyses of the inherent biases of the popular style transfer modules.

• Based on PFN in conjunction with an unbiased style transfer module, we propose a novel style transfer framework, i.e., ArtFlow, which achieves comparable style transfer results to state-of-the-art methods while avoiding the Content Leak issue.

三个贡献:

揭示了最新风格的迁移算法的内容泄漏问题，并确定了AdaIN[20]、WCT[32] 和 Avatar-Net[45] 中内容泄漏的三个主要原因。

提出了一种基于神经流的无偏、无损、可逆网络 PFN，该网络允许对流行风格转移模块的固有偏差进行理论和实证分析。

基于 PFN 和一个无偏的风格迁移模块，提出了一个新的风格迁移框架，即 ArtFlow，它在避免内容泄漏问题的同时，实现了与最先进的方法相当的风格迁移结果。

2. Related Work

Neural flows

Neural flows refer to a subclass of deep generative models, which learns the exact likelihood of high dimensional observations (e.g., natural images, texts, and audios) through a chain of reversible transformations. As a pioneering work of neural flows, NICE [5] is proposed to transform low dimensional densities to high dimensional observations with a stack of affine coupling layers. Following NICE, a series of neural flows, including RealNVP [6], GLOW [28], and Flow++ [18], are proposed to improve NICE with more powerful and flexible reversible transformations. The recently proposed neural flows [28, 18, 39] are capable of synthesizing high-resolution natural/face images, realistic speech data [43, 26], and performing makeup transfer [8].

In this work, the proposed ArtFlow consists of a reversible network PFN and an unbiased feature transfer module. The content leak can be addressed via lossless forward and backward inferences and unbiased feature transfer. In comparison, BeautyGlow [8] shares the similar spirits but is not applicable for unbiased style transfer.

神经流是深度生成模型的一个子类，它通过可逆迁移链学习高维观察 (如自然图像、文本和音频) 的确切可能性。NICE 作为神经流的一项开创性工作，提出了利用一叠仿射耦合层将低维密度转化为高维观测值的方法。在 NICE 之后，提出了一系列神经流，包括 RealNVP、GLOW 和 Flow++，以更强大、更灵活的可逆迁移来改进 NICE。最近提出的神经流 [28,18,39] 能够合成高分辨率的自然/人脸图像，真实的语音数据 [43,26]，并进行化妆迁移 [8]。

本文提出的 ArtFlow 由一个可逆网络 PFN 和一个无偏特征传输模块组成。内容泄漏可以通过无损的前向和后向推理和无偏特征转移来解决。相比之下，BeautyGlow 分享了类似的方法，但不适用于无偏见的风格迁移。

3. Pre-analysis

本节内容主要探讨什么是内容泄漏现象，及其产生的原因。

Before introducing the proposed ArtFlow, we first make a pre-analysis to uncover the Content Leak phenomenon of the state-of-the-art style transfer algorithms and analyze the causes of Content Leak. We make the aforementioned preanalysis by answering two questions: What Content Leak is and why Content Leak happens.

在引入本文提出的 ArtFlow 之前，首先进行了预分析，揭示了目前最先进的风格传输算法的内容泄漏现象，并分析了内容泄漏的原因。我们通过回答两个问题进行了前文的分析：什么是内容泄露?为什么会发生内容泄露?

3.1. What is Content Leak?

For a style transfer algorithm, Content Leak occurs because the stylization results lose some content information. Although the existing state-of-the-art style transfer algorithms, e.g., AdaIN [20], WCT [32], and Avatar-Net [45], can produce good style transfer results, they still suffer from the Content Leak issue. Since it is hard to directly extract the content information from the stylized image and compare it with the input content image, we adopt an alternative way to show empirical evidence of the Content Leak phenomenon. More specifically, we first perform the style transfer with an input content-style pair based on a style transfer algorithm. We then take the stylized image as the new content and repeatedly perform the style transfer process for 20 times.

Fig. 1 shows the results of our experiments for AdaIN (row 1), WCT (row 2), and Avatar-Net (row 3). According to Fig. 1, when we perform style transfer for 20 rounds, we can hardly recognize any detail of the content image. Such an empirical evidence indicates that the Content Leak phenomenon occurs in all AdaIN, WCT, and Avatar-Net. In the following, we discuss the causes of the Content Leak, which imply that the Content Leak issue also exists in other state-of-the-art style transfer algorithms.

对于样式迁移算法，会发生内容泄漏，因为风格化结果会丢失一些内容信息。AdaIN、WCT、Avatar-Net 等现有最先进的风格传输算法虽然可以产生良好的风格传输效果，但仍然存在内容泄漏问题。由于很难直接从风格化的图像中提取内容信息并与输入的内容图像进行比较，本文采用了另一种方法来显示内容泄漏现象的经验证据。更具体地说，

首先，使用基于风格传输算法，输入内容-风格对图像，执行风格迁移。

然后，将风格化后的图像作为新内容，重复进行风格迁移过程，重复 20 次。

图 1 显示了对 AdaIN (第 1 行)，WCT (第 2 行)，Avatar-Net (第 3 行) 的实验结果。从图 1 可以看出，当进行 20 轮风格迁移时，几乎无法识别内容图像的任何细节。这一经验证据表明，AdaIN、WCT 和 Avatar-Net 都存在内容泄露现象。在下面，本文讨论内容泄漏的原因，这意味着内容泄漏问题也存在于其他最先进的风格的迁移算法。

3.2. Why Does Content Leak Happen?

本文以 AdaIN[15]、WCT[32] 和 Avatar-Net[45] 为代表的风格转移算法，研究内容泄露现象的原因。

[20] Arbitrary style transfer in real-time with adaptive instance normalization. ICCV, 2017

[32] Universal style transfer via feature transforms. NeurIPS, 2017

[45] Avatarnet: multi-scale zero-shot style transfer by feature decoration. CVPR, 2018

Reconstruction error

A straight-forward explanation to Content Leak is that the decoder of existing style transfer algorithms cannot achieve lossless image reconstruction of the input content image. For example, all AdaIN, WCT, and Avatar-Net adopt VGG19 [47] as the encoder and train a structurally symmetrical decoder to invert the features of VGG19 back to the image space. Although an image reconstruction loss [32] or a content loss [20] is used to train the decoder, Li et al. (WCT) [32] acknowledge that the decoder is far from perfect due to the loss of spatial information brought by the pooling operations in the encoder.

Consequently, the accumulated image reconstruction error may gradually disturb the content details and lead to the Content Leak.

对内容泄漏的一个简单解释是，现有的风格传输算法的解码器不能实现输入内容图像的无损图像重建。例如，所有 AdaIN、WCT 和 Avatar-Net 都采用 VGG19 作为编码器，并训练一个结构对称的解码器来将 VGG19 的特征倒转回图像空间。虽然使用图像重建损失或内容损失来训练解码器，Li et al. 承认，由于编码器中的池化操作带来的空间信息损失，解码器还远远不够完美。

因此，累积的图像重建误差可能会逐渐干扰内容细节，导致内容泄漏。

Biased decoder training

本节用 Loss 曲线和图像结果来说明内容泄露问题（非常重要，在描述一个自己的观察时，要充分从多个角度来证明这个观察，尤其是客观角度，例如统计、曲线等。如果只是简单的用语言描述观察现象，就很缺乏可信度）。

The above-mentioned reconstruction error can only partially explain the Content Leak phenomenon. In addition, biased decoder training is another cause. We take the training scheme of AdaIN as an example to explain how its loss function settings lead to Content Leak. AdaIN trains the decoder with a weighted combination of a content loss Lc and a style loss Ls, where

上述重构误差只能部分解释内容泄露现象。此外，有偏差的解码器训练是另一个原因。以 AdaIN 的训练方案为例，说明其损失函数设置是如何导致内容泄漏的。AdaIN 用内容损失 Lc 和风格损失 Ls 的加权组合训练解码器，即公式 (1) 和 (2)。

Here t denotes the output of the adaptive instance normalization, F and G represent the encoder and the decoder, respectively, φi denotes a layer in VGG19 used to compute the style loss, and µ, σ represent the mean and standard deviation of feature maps, respectively. Due to Ls, the decoder is trained to trade off between Lc and Ls, rather than trying to reconstruct images perfectly.

Fig. 3 shows the training loss curves of AdaIN with and without Ls. When we train the decoder of AdaIN with only Lc, the converged value of Lc (cyan curve) is significantly smaller than training with the weighted combination of Lc and Ls (blue curve). Consequently, the auto-encoder of AdaIN is biased towards rendering more artistic effects, which causes Content Leak.

Fig. 4 shows the image reconstruction results by propagating through the auto-encoder of AdaIN for 50 rounds. We take the output of the auto-encoder in the previous round as the input of the next round and perform image reconstruction repeatedly. With the increase of the inference rounds, weird artistic patterns gradually appear in the produced results, which indicates that the auto-encoder of AdaIN may memorize image styles in training and bias towards the training styles in inference.

其中 t 为自适应实例归一化的输出，F 和 G 分别为编码器和解码器，φi 为 VGG19 中用于计算风格损失的一层，µ, σ 分别为特征图的均值和标准差。由于 Ls，解码器被训练在 Lc 和 Ls 之间进行权衡，而不是试图完美地重建图像。

图 3 为有和无 Ls 时 AdaIN 的训练损失曲线。在只用 Lc 训练 AdaIN 解码器时，Lc (青色曲线) 的收敛值明显小于Lc 和 Ls 加权组合训练时的收敛值 (蓝色曲线)。因此，AdaIN 的自动编码器偏向于渲染更艺术的效果，导致内容泄漏。

图 4 为经过 AdaIN 自动编码器传播50轮后的图像重建结果。将上一轮自动编码器的输出作为下一轮的输入，反复进行图像重建。随着推理轮数的增加，产生的结果中逐渐出现奇怪的艺术模式，这表明 AdaIN 的自动编码器在训练中可能记忆图像风格，在推理中倾向于训练风格。

Biased style transfer module

Biased style transfer module is another cause of the Content Leak. We take the Style Decorator in Avatar-Net as an example. For the normalized content feature fc and style feature fs, the key mechanism of the Style Decorator is motivated by the deep image analogy [35], which is composed of two steps.

In the first step, the algorithm finds a corresponding patch in fs for every patch in fc according to the content similarity between two patches.

In the next step, fcs is formed by replacing patches in fc with the corresponding patches in fs. Since such a patch replacement is irreversible, fc cannot be recovered from fcs, which makes fcs be biased towards style and consequently causes the Content Leak phenomenon.

偏差风格传输模块是导致内容泄漏的另一个原因。以 Avatar-Net 中的 Style Decorator 为例。对于规范化的内容特征 fc 和风格特征 fs, Style Decorator 的关键机制是由深度图像类比]驱动的，该机制由两个步骤组成。

在第一步中，算法根据两个 patch 之间的内容相似度，为 fc 中的每个 patch 在 fs 中找到对应的patch。

下一步将 fc 中的 patchs 替换为fs中相应的 patchs，形成 fcs。由于这种 patchs 替换是不可逆的，fc 无法从 fcs 中恢复，导致 fcs 偏向风格，导致 Content Leak 现象。

We summarize and illustrate three main causes of Content Leak in Fig. 5. While the reconstruction error may disturb the content information in the output image, the biased image recovery and the biased transfer module may lead to a style shift in the output image.

图 5 中总结并说明了内容泄露的三个主要原因。

重构误差可能会干扰输出图像中的内容信息，

而偏置图像恢复和偏置传输模块可能会导致输出图像的风格移位。

4. Method

4.1. Overview of the ArtFlow Framework

In this work, we present a novel unbiased style transfer framework named ArtFlow to address the Content Leak issue of the state-of-the-art style transfer approaches. Different from the encoder-transfer-decoder scheme commonly used in existing neural style transfer algorithms, ArtFlow performs image style transfer through a projection-transferr-eversion scheme.

As shown in Fig. 6, ArtFlow relies on a reversible neural flow model, named Projection Flow Network (PFN).

In the projection step, the [ content images and style images ] are fed into PFN for [ lossless deep feature ] extraction via the [ forward propagation of PFN ].

In the transfer step, the [ content and style features ] are transferred to the [ stylized feature ] with an [ unbiased style transfer module ].

In the reversion step, the [ stylized feature ] is reconstructed to a [ stylized image ] via the [ reverse propagation of PFN ].

[ ] input ; [ ] output ; [ ] model

Since the information flow in PFN and the unbiased style transfer module are both lossless and unbiased, ArtFlow achieves unbiased image style transfer to avoid the Content Leak.

本文提出了一个无偏风格传输框架 ArtFlow，以解决风格迁移方法的内容泄漏问题。与现有神经风格迁移算法中常用的编码器-传输-解码器方案不同，ArtFlow 通过投影-迁移-反转方案来执行图像风格迁移。

如图 6 所示，ArtFlow 依赖于一个可逆的神经流模型，称为 Projection flow Network, (PFN)。

在映射步骤中，将内容图像和风格图像输入到 PFN 中，通过 PFN 的前向传播进行无损深度特征提取。

在迁移步骤中，内容和样式特性通过一个无偏风格迁移模块转移到风格化特性。

在复原步骤中，通过 PFN 的反向传播将风格化特征重建为风格化图像。

由于 PFN 中的信息流和无偏风格传输模块都是无损的、无偏的，ArtFlow 实现了无偏的图像风格传输，避免了内容泄露。

In the following, we first discuss the details of PFN in Section 4.2. Then, we discuss the choice of the unbiased style transfer module by performing both theoretical and quantitative analyses of the inherent biases of existing transfer modules in Section 4.3.

在接下来，

4.2 节首先讨论 PFN 的细节。

4.3 节现有迁移模块的固有偏差进行理论和定量分析来讨论无偏风格迁移模块的选择。

4.2. Projection Flow Network

Projection Flow Network (PFN) serves as both the deep feature extractor and image synthesizer of our ArtFlow framework. In this work, we construct PFN by following the effective Glow model [28].

As shown in Fig. 6, PFN consists of a chain of three learnable reversible transformations, i.e., additive coupling, invertible 1×1 convolution, and Actnorm.

All the components of PFN are reversible, making PFN fully reversible that the information is lossless during the forward and reverse propagation. I

n the following, we describe the three reversible transformations.

投影流网络 (PFN) 是 ArtFlow 框架的深度特征提取器和图像合成器。本文根据有 Glow 构造 PFN。如图 6 所示，PFN 由三个可学习的可逆变换链组成，即可加性耦合、可逆1×1卷积和Actnorm。

PFN 的所有组成部分都是可逆的，使 PFN 完全可逆，在正向和反向传播过程中信息是无损的。

在下面，我们描述这三个可逆变换。

Additive coupling

Dinh et al. [5, 6] proposed an expressive reversible transformation named affine coupling layer. In this work, we adopt a special case of affine coupling, i.e., additive coupling, for PFN. The forward computation of additive coupling is

（*）

The split() function splits a tensor into two halves along the channel dimension. NN() is (any) neural network where the input and the output have the same shape. The concat() function concatenates two tensors along the channel dimension. The reverse computation of additive coupling can be easily derived.

Dinh et al. 提出了一种具有表达性的可逆变换，称为仿射耦合层。本文对 PFN 采用了一种特殊的仿射耦合情况，即加性耦合。加性耦合的前向计算表示为公式（*）。

split() : 沿通道维度将张量分成两部分。

NN()： 是输入和输出具有相同形状的 (任意) 神经网络。

concat()：将通道维度上的两个张量连接起来。

可以很容易地推导出加性耦合的反向计算。

We observe that a flow model with additive coupling layers is sufficient to handle the style transfer task in experiments. Moreover, the additive coupling is more efficient and stable than the affine coupling in model training. Therefore, we employ additive coupling instead of affine coupling as the expressive transformation layer in PFN.

在实验中，观察到加性耦合层的流模型足以处理风格迁移任务。此外，在模型训练中，加性耦合比仿射耦合更有效、更稳定。因此，我们采用加性耦合代替仿射耦合作为 PFN 的表达迁移层。

Invertible 1×1 convolution

Since the additive coupling layer only processes a half of the feature maps, it is necessary to permute the channel dimensions of feature maps, so that each dimension can affect all the other dimensions [5, 6]. We follow Glow [28] to use a learnable invertible 1×1 convolution layer for flexible channel permutation, as

(3)

W is the weight matrix of shape c×c, where c is the channel dimension of tensor x and y. Its reverse function is $x_{i,j }= W^{-1}y_{i,j }$ .

由于加性耦合层只处理了一半的特征图（公式 (*) 中的 xa），因此有必要对特征图的通道维度进行置换，使每个维度都能影响其他所有维度。本文采用 Glow[28] 中使用可学习的可逆 1x1 卷积层进行灵活的通道道置换，如 (3)。

W 为形状为 c x c 的权值矩阵，其中 c 为张量 x 和 y 的通道维数，其逆函数为 $x_{i,j }= W^{-1}y_{i,j }$ 。

Actnorm

We follow Glow [28] to use the activation normalization layer (Actnorm) as an alternative to batch normalization [21]. Actnorm performs per-channel affine transformation on tensor x, as

(4)

where i, j denote a spatial position on the tensor. w and b are the scale and bias parameters of affine transformation, and they are learnable in model training. The reverse funciton is $x_{i,j }= (y_{i,j }- b)/w$ .

我们遵循Glow[28]使用激活标准化层(Actnorm)作为批处理标准化[21]的替代。Actnorm对张量x执行每个通道的仿射变换，如
（4）
其中i j表示张量上的空间位置。W和b是仿射变换的尺度和偏置参数，在模型训练中是可学习的。逆函数为 $x_{i,j }= (y_{i,j }- b)/w$ 。

In addition to the three reversible transformations, the squeeze operation is inserted into certain parts of PFN to reduce the spatial size of 2D feature maps. The squeeze operation splits the features into smaller patches along the spatial dimension and then concatenates the patches along the channel dimension.

除了这三种可逆变换外，在 PFN 的某些部分插入 squeeze 操作，以减小二维特征图的空间尺寸。挤压操作沿着空间维度将特征分割成 patches，然后沿着通道维度将 patches 连接起来。

4.3. Unbiased Content-Style Separation

Which style transfer module should ArtFlow use to achieve the unbiased style transfer? To answer this question, we first make a theoretical analysis of the biases of two popular style transfer modules, i.e., the adaptive instance normalization in AdaIN, and the whitening and coloring transforms in WCT.

ArtFlow 应该使用哪个风格迁移模块来实现不偏的风格迁移? 为了回答这个问题，作者首先从理论上分析了两种流行的风格迁移模块的偏差，即 AdaIN 中的自适应实例归一化和 WCT 中的美白和着色迁移。即改选择 AdaIN or WCT?

The mechanism of the universal style transfer methods can be regarded as a natural evolution of the bilinear model proposed by Tenenbaum and Freeman in [52], which separates an image into a content factor C and a style factor S and then makes style transfer by replacing the style factor S in the content image with that in the target image. Similarly, the universal style transfer methods assume that the content information and the style information in the deep feature space are disentangled explicitly [20, 32, 31, 40, 2, 1, 56, 22, 60] or implicitly [4, 45]. For example, AdaIN [20] separates deep features into normalized feature maps and mean/std vectors, which can be regarded as the content factor C and style factor S, respectively.

普遍风格的机制迁移方法可以被看作是一个自然进化的双线性模型，其将图像分解为一个内容因子C和风格因子S，然后风格迁移即将风格因素代替到目标图像的内容。同样，通用的风格传递方法假设内容信息和深度特征空间中的风格信息被显式或隐式地解耦。例如 AdaIN 将深度特征分为归一化特征图和 mean/std 向量，可以分别看作内容因子 C 和风格因子 S。

Following the theoretical framework of the Bilinear Model [52], we can define the unbiased style transfer as:

Definition 1

Suppose we have a bilinear style transfer module $f_{cs }= C(f_c)S(f_s)$ , where C, S denote the content factor and the style factor in the bilinear model, respectively. $f_{cs}$ is an unbiased style transfer module if $C(f_{cs}) = C(f_c)$ and $S(f_{cs}) = S(f_s)$ .

Based on Def. 1, we have the following two theorems.

遵循双线性模型[52]的理论框架，我们可以将无偏风格转移定义为:

定义1

假设我们有一个双线性风格转移模块fcs = C(fc)S(fs)，其中C, S分别表示双线性模型中的内容因子和风格因子。如果C(fcs) = C(fc) and S(fcs) = S(fs)，则fcs是一个无偏风格迁移模块。

根据Def 1，我们有以下两个定理。

Theorem 1

The adaptive instance normalization in AdaIN is an unbiased style transfer module.

Theorem 2

The whitening and coloring transform in WCT is an unbiased style transfer module.

The proofs for Theorems 1 and 2 can be found in the supplementary material.

The Style Decorator in Avatar-Net [45] does not fit the bilinear model, while the empirical analysis in Sec. 3.2 shows that Style Decorator is a biased style transfer module.

定理及其数学证明：

定理1 AdaIN中的自适应实例归一化是一种无偏风格迁移模块。

定理2 WCT中的白化和上色变换是一个无偏风格迁移模块。

定理 1 和定理 2 的证明可以在补充材料中找到。Avatar-Net[45] 中的 Style Decorator 不符合双线性模型，而第 3.2 节的实证分析表明 Style Decorator 是一个有偏差的风格迁移模块。

In addition to the theoretical analyses, we also quantitatively verify the unbiased property of the transfer modules in AdaIN and WCT. Quantitatively studying the property of popular style transfer modules is an unsolved question because the auto-encoder used by existing universal style transfer methods has significant image reconstruction errors and may be biased towards styles as discussed in Sec. 3.2. Consequently, the produced style transfer results using auto-encoders cannot precisely reflect the effects of the style transfer modules upon deep features. The proposed PFN addresses this issue. Specifically, if we take the forward inference and the reverse inference of the proposed PFN as the encoder and decoder, respectively, we can obtain a lossless and unbiased “auto-encoder” for style transfer, which can avoid the influence of the image reconstruction error and the biased image recovery brought by the decoder.

定量证明：

除了理论分析外，作者还定量地验证了 AdaIN 和 WCT 中传递模的无偏性。定量研究流行风格迁移模块的性质是一个未解决的问题，因为现有通用风格迁移方法使用的自动编码器具有显著的图像重建误差，可能会偏向于 3.2 节讨论的风格。因此，使用自动编码器生成的风格迁移结果不能准确地反映风格迁移模块对深度特征的影响。PFN 解决了这个问题。具体地说，如果把 PFN 的正向推理和逆向推理分别为编码器和解码器，即可以获得一个无损和无偏 auto-encoder 风格转移，可以避免图像重建误差和偏置的影响图像恢复带来的解码器。

By using the proposed PFN as the lossless feature projector/inverter, we make a quantitative analysis about the content and style reconstruction errors of the transfer modules in AdaIN and WCT. Fig. 7 demonstrates two findings:

1) Considering (a) vs. (b) and (c) vs. (d), the proposed PFN can indeed make lossless and unbiased content and style reconstruction while the auto-encoder based on VGG19 cannot.

2) (b) and (d) quantitatively verify that the transfer module of AdaIN and WCT are unbiased.

采用所提出的 PFN 作为无损特征投影/逆变器，定量分析 AdaIN 和 WCT 中传输模块的内容和风格重构误差。图 7 显示了两个发现：

1) 考虑 (a) vs. (b) 和 (c) vs. (d)，本文提出的 PFN 确实可以无损、无偏的内容和风格重构，而基于 VGG19 的自编码则不能。

2) (b) 和 (d) 定量验证 AdaIN 和 WCT 的传输模块是无偏的。

Based on theoretical and quantitative analyses to transfer modules in AdaIN and WCT, we let the adaptive instance normalization and the whitening and coloring transforms be two options for ArtFlow to achieve unbiased style transfer.

基于对 AdaIN 和 WCT 中迁移模块的理论和定量分析，二者成为 ArtFlow 实现无偏风格迁移的两种选择。

可逆网络风格迁移-解决内容泄漏问题 [CVPR 2021] ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows

[CVPR 2021] ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows

Abstract

1. Introduction

2. Related Work

Neural flows

3. Pre-analysis

3.1. What is Content Leak?

3.2. Why Does Content Leak Happen?

4. Method

4.1. Overview of the ArtFlow Framework

4.3. Unbiased Content-Style Separation

猜你喜欢