Hands-on learning style image depth study of migration

Reference Bo Yu learning platform "hands-on learning deep learning" course contents study notes written by
the original link: https: //www.boyuai.com/elites/course/cZu18YmweLv10OeV/lesson/3nIhgFs64Fs5g3JEn5EXzm
sense Xie Boyu platform, Datawhale, and whales, AWS provides us with the opportunity to learn for free! !
Total Learning experience: Peter Yu of course do well, very systematic course, each higher-level courses, there will be introduced before the renewal of the need to master the basics, so it is suitable for my poor basis for such students to learn, based on the recommendations of poor students can focus on other courses Bo Yu:
mathematical foundation: https: //www.boyuai.com/elites/course/D91JM0bv72Zop1D3
machine learning the basics: https: //www.boyuai.com/elites/course/ 5ICEBwpbHVwwnK3C

If you have any questions, please leave a message in the comments section and I will do our best to answer them!

introduction

The so-called image style migration refers to the style of learning algorithm using famous paintings, and then apply this style to the technique on another picture. The famous image processing applications Prisma country style is the use of technology migration, the average user will automatically transform photos into a picture with the style of the artist. This article will introduce the principle behind this technique. In addition, the application will use TensorFlow achieve a rapid migration of style.

Style migration

If you are a photography enthusiast, you might come into contact with the filter. It can change the color style photos of landscapes so that it is like a sharper or more whitening. But usually only a filter change certain aspects of the photo. If you want to achieve the desired photo in the style, often you need to try a number of different combinations of complexity as much as the model parameter adjustment.

In this section, we will describe how to use convolution neural network to automatically apply a style image over another image, that migration patterns (style transfer) [1]. Here we need two input image, an image content, style is another image, we will use neural network to modify the content of the image to close style image in style. Content of the image in Figure 9.12 is author of the book landscapes in suburban Seattle Mount Rainier National Park (Mount Rainier National Park) shot, style and image is a theme of autumn oak painting. The case of a composite image output in the final to retain the shape of the body content of the image object in the application of the painting stroke style images, but also to the overall color more vivid.

Image Name

method

FIG example to illustrate 9.13 with a migration pattern convolution neural network. First, we initialize the composite image, for example, it is initialized to the contents of the image. The composite image is the only variable migration patterns need to be updated, that the style migration model parameters required iterations. We then select a pre-convolution neural network trained to extract features of the image, which do not need to update the model parameters in training. Depth convolutional neural network with a plurality of layers of the image features extracted stepwise. We can select the output layer, as some of the content or style characteristic features. 9.13 to FIG example, where the selected pre-trained neural network convolution containing three layers, wherein the second layer wherein the content of the output image and the output of the first and third layers as is characteristic of the image pattern. Next, we compute the migration loss function by forward propagation pattern (solid line arrow direction), and reverse spread (broken line arrow direction) iterative model parameters, i.e. continuously updated composite image. Style migrated common loss function consists of three parts: content loss (content loss) of the composite image with the content image close on the content features, style loss (style loss) so synthesized image pattern of the image close to the style characteristics, the total variation loss difference (total variation loss) helps to reduce noise in the composite image. Finally, when the end of the training model, we output the migration patterns of the model parameters, to obtain the final composite image.

Image Name

Below, we learn more about the technical details of style migration through experiments. Experimental need to use some imported packages or modules.
matplotlib inline%
Import Time
Import Torch
Import torch.nn.functional AS F
Import torchvision
Import numpy AS NP
Import matplotlib.pyplot AS plt
from PIL Import Image

import sys
sys.path.append("/home/kesci/input")
import d2len9900 as d2l
device = torch.device(‘cuda’ if torch.cuda.is_available() else ‘cpu’) # 均已测试

print(device, torch.version)

Read the contents of the image and style image

First, we were reading the content of the image and style image. As can be seen from the printed image coordinate axes, their dimensions are not the same.

#d2l.set_figsize()
content_img = Image.open(’/home/kesci/input/NeuralStyle5603/rainier.jpg’)
plt.imshow(content_img);
Here Insert Picture Description
style_img = Image.open(’/home/kesci/input/NeuralStyle5603/autumn_oak.jpg’)
plt.imshow(style_img);
Here Insert Picture Description

Image preprocessing and postprocessing

Pre-and post-processing functions of the image function defined below. Pretreatment function preprocessof normalized input image are done in the RGB three channels, and the result is converted into a convolutional neural network accepts input format. Handler after postprocessthe pixel values of the output image will be reduced back to the value before standardization. Because the image print function requires a float value for each pixel between 0 and 1, we use the clampfunction value is less than 0 and greater than 0 and 1 were taken 1.

rgb_mean = np.array([0.485, 0.456, 0.406])
rgb_std = np.array([0.229, 0.224, 0.225])

def preprocess(PIL_img, image_shape):
process = torchvision.transforms.Compose([
torchvision.transforms.Resize(image_shape),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(mean=rgb_mean, std=rgb_std)])

return process(PIL_img).unsqueeze(dim = 0) # (batch_size, 3, H, W)

def postprocess(img_tensor):
inv_normalize = torchvision.transforms.Normalize(
mean= -rgb_mean / rgb_std,
std= 1/rgb_std)
to_PIL_image = torchvision.transforms.ToPILImage()
return to_PIL_image(inv_normalize(img_tensor[0].cpu()).clamp(0, 1))

抽取特征

我们使用基于ImageNet数据集预训练的VGG-19模型来抽取图像特征 [1]。

!echo $TORCH_HOME # 将会把预训练好的模型下载到此处(没有输出的话默认是.cache/torch)
pretrained_net = torchvision.models.vgg19(pretrained=False)
pretrained_net.load_state_dict(torch.load(’/home/kesci/input/vgg193427/vgg19-dcbb9e9d.pth’))

为了抽取图像的内容特征和样式特征,我们可以选择VGG网络中某些层的输出。一般来说,越靠近输入层的输出越容易抽取图像的细节信息,反之则越容易抽取图像的全局信息。为了避免合成图像过多保留内容图像的细节,我们选择VGG较靠近输出的层,也称内容层,来输出图像的内容特征。我们还从VGG中选择不同层的输出来匹配局部和全局的样式,这些层也叫样式层。在“使用重复元素的网络(VGG)”一节中我们曾介绍过,VGG网络使用了5个卷积块。实验中,我们选择第四卷积块的最后一个卷积层作为内容层,以及每个卷积块的第一个卷积层作为样式层。这些层的索引可以通过打印pretrained_net实例来获取。

style_layers, content_layers = [0, 5, 10, 19, 28], [25]

在抽取特征时,我们只需要用到VGG从输入层到最靠近输出层的内容层或样式层之间的所有层。下面构建一个新的网络net,它只保留需要用到的VGG的所有层。我们将使用net来抽取特征。
net_list = []
for i in range(max(content_layers + style_layers) + 1):
net_list.append(pretrained_net.features[i])
net = torch.nn.Sequential(*net_list)

给定输入X,如果简单调用前向计算net(X),只能获得最后一层的输出。由于我们还需要中间层的输出,因此这里我们逐层计算,并保留内容层和样式层的输出。

def extract_features(X, content_layers, style_layers):
contents = []
styles = []
for i in range(len(net)):
X = neti
if i in style_layers:
styles.append(X)
if i in content_layers:
contents.append(X)
return contents, styles
下面定义两个函数,其中get_contents函数对内容图像抽取内容特征,而get_styles函数则对样式图像抽取样式特征。因为在训练时无须改变预训练的VGG的模型参数,所以我们可以在训练开始之前就提取出内容图像的内容特征,以及样式图像的样式特征。由于合成图像是样式迁移所需迭代的模型参数,我们只能在训练过程中通过调用extract_features函数来抽取合成图像的内容特征和样式特征。

def get_contents(image_shape, device):
content_X = preprocess(content_img, image_shape).to(device)
contents_Y, _ = extract_features(content_X, content_layers, style_layers)
return content_X, contents_Y

def get_styles(image_shape, device):
style_X = preprocess(style_img, image_shape).to(device)
_, styles_Y = extract_features(style_X, content_layers, style_layers)
return style_X, styles_Y

定义损失函数

下面我们来描述样式迁移的损失函数。它由内容损失、样式损失和总变差损失3部分组成。

内容损失

与线性回归中的损失函数类似,内容损失通过平方误差函数衡量合成图像与内容图像在内容特征上的差异。平方误差函数的两个输入均为extract_features函数计算所得到的内容层的输出。

def content_loss(Y_hat, Y):
return F.mse_loss(Y_hat, Y)

样式损失

样式损失也一样通过平方误差函数衡量合成图像与样式图像在样式上的差异。为了表达样式层输出的样式,我们先通过extract_features函数计算样式层的输出。假设该输出的样本数为1,通道数为 c c ,高和宽分别为 h h w w ,我们可以把输出变换成 c c h w hw 列的矩阵 X \boldsymbol{X} 。矩阵 X \boldsymbol{X} 可以看作是由 c c 个长度为 h w hw 的向量 x 1 , , x c \boldsymbol{x}_1, \ldots, \boldsymbol{x}_c 组成的。其中向量 x i \boldsymbol{x}_i 代表了通道 i i 上的样式特征。这些向量的格拉姆矩阵(Gram matrix) X X R c × c \boldsymbol{X}\boldsymbol{X}^\top \in \mathbb{R}^{c \times c} i i j j 列的元素 x i j x_{ij} 即向量 x i \boldsymbol{x}_i x j \boldsymbol{x}_j 的内积,它表达了通道 i i 和通道 j j 上样式特征的相关性。我们用这样的格拉姆矩阵表达样式层输出的样式。需要注意的是,当 h w hw 的值较大时,格拉姆矩阵中的元素容易出现较大的值。此外,格拉姆矩阵的高和宽皆为通道数 c c 。为了让样式损失不受这些值的大小影响,下面定义的gram函数将格拉姆矩阵除以了矩阵中元素的个数,即 c h w chw

def gram(X):
num_channels, n = X.shape[1], X.shape[2] * X.shape[3]
X = X.view(num_channels, n)
return torch.matmul(X, X.t()) / (num_channels * n)
自然地,样式损失的平方误差函数的两个格拉姆矩阵输入分别基于合成图像与样式图像的样式层输出。这里假设基于样式图像的格拉姆矩阵gram_Y已经预先计算好了。

def style_loss(Y_hat, gram_Y):
return F.mse_loss(gram(Y_hat), gram_Y)

总变差损失

有时候,我们学到的合成图像里面有大量高频噪点,即有特别亮或者特别暗的颗粒像素。一种常用的降噪方法是总变差降噪(total variation denoising)。假设 x i , j x_{i,j} 表示坐标为 ( i , j ) (i,j) 的像素值,降低总变差损失

i , j x i , j x i + 1 , j + x i , j x i , j + 1 \sum_{i,j} \left|x_{i,j} - x_{i+1,j}\right| + \left|x_{i,j} - x_{i,j+1}\right|

能够尽可能使邻近的像素值相似。

def tv_loss(Y_hat):
return 0.5 * (F.l1_loss(Y_hat[:, :, 1:, :], Y_hat[:, :, :-1, :]) +
F.l1_loss(Y_hat[:, :, :, 1:], Y_hat[:, :, :, :-1]))

损失函数

样式迁移的损失函数即内容损失、样式损失和总变差损失的加权和。通过调节这些权值超参数,我们可以权衡合成图像在保留内容、迁移样式以及降噪三方面的相对重要性。

content_weight, style_weight, tv_weight = 1, 1e3, 10

def compute_loss(X, contents_Y_hat, styles_Y_hat, contents_Y, styles_Y_gram):
# 分别计算内容损失、样式损失和总变差损失
contents_l = [content_loss(Y_hat, Y) * content_weight for Y_hat, Y in zip(
contents_Y_hat, contents_Y)]
styles_l = [style_loss(Y_hat, Y) * style_weight for Y_hat, Y in zip(
styles_Y_hat, styles_Y_gram)]
tv_l = tv_loss(X) * tv_weight
# 对所有损失求和
l = sum(styles_l) + sum(contents_l) + tv_l
return contents_l, styles_l, tv_l, l

创建和初始化合成图像

在样式迁移中,合成图像是唯一需要更新的变量。因此,我们可以定义一个简单的模型GeneratedImage,并将合成图像视为模型参数。模型的前向计算只需返回模型参数即可。

class GeneratedImage(torch.nn.Module):
def init(self, img_shape):
super(GeneratedImage, self).init()
self.weight = torch.nn.Parameter(torch.rand(*img_shape))

def forward(self):
    return self.weight

下面,我们定义get_inits函数。该函数创建了合成图像的模型实例,并将其初始化为图像X。样式图像在各个样式层的格拉姆矩阵styles_Y_gram将在训练前预先计算好。

def get_inits(X, device, lr, styles_Y):
gen_img = GeneratedImage(X.shape).to(device)
gen_img.weight.data = X.data
optimizer = torch.optim.Adam(gen_img.parameters(), lr=lr)
styles_Y_gram = [gram(Y) for Y in styles_Y]
return gen_img(), styles_Y_gram, optimizer

训练

在训练模型时,我们不断抽取合成图像的内容特征和样式特征,并计算损失函数。

def train(X, contents_Y, styles_Y, device, lr, max_epochs, lr_decay_epoch):
print("training on ", device)
X, styles_Y_gram, optimizer = get_inits(X, device, lr, styles_Y)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, lr_decay_epoch, gamma=0.1)
for i in range(max_epochs):
start = time.time()

    contents_Y_hat, styles_Y_hat = extract_features(
            X, content_layers, style_layers)
    contents_l, styles_l, tv_l, l = compute_loss(
            X, contents_Y_hat, styles_Y_hat, contents_Y, styles_Y_gram)
    
    optimizer.zero_grad()
    l.backward(retain_graph = True)
    optimizer.step()
    scheduler.step()
    
    if i % 50 == 0 and i != 0:
        print('epoch %3d, content loss %.2f, style loss %.2f, '
              'TV loss %.2f, %.2f sec'
              % (i, sum(contents_l).item(), sum(styles_l).item(), tv_l.item(),
                 time.time() - start))
return X.detach()

下面我们开始训练模型。首先将内容图像和样式图像的高和宽分别调整为150和225像素。合成图像将由内容图像来初始化。

image_shape = (150, 225)
net = net.to(device)
content_X, contents_Y = get_contents(image_shape, device)
style_X, styles_Y = get_styles(image_shape, device)
output = train(content_X, contents_Y, styles_Y, device, 0.01, 500, 200)

下面我们将训练好的合成图像保存起来。可以看到图9.14中的合成图像保留了内容图像的风景和物体,并同时迁移了样式图像的色彩。因为图像尺寸较小,所以细节上依然比较模糊。

plt.imshow(postprocess(output));
Here Insert Picture Description
Here Insert Picture Description

为了得到更加清晰的合成图像,下面我们在更大的 300 × 450 300 \times 450 尺寸上训练。我们将图9.14的高和宽放大2倍,以初始化更大尺寸的合成图像。

image_shape = (300, 450)
_, content_Y = get_contents(image_shape, device)
_, style_Y = get_styles(image_shape, device)
X = preprocess(postprocess(output), image_shape).to(device)
big_output = train(X, content_Y, style_Y, device, 0.01, 500, 200)

plt.imshow(postprocess(big_output));

Here Insert Picture Description
可以看到,由于图像尺寸更大,每一次迭代需要花费更多的时间。从训练得到的图9.15中可以看到,此时的合成图像因为尺寸更大,所以保留了更多的细节。合成图像里面不仅有大块的类似样式图像的油画色彩块,色彩块中甚至出现了细微的纹理。

Image Name

小结

  • 样式迁移常用的损失函数由3部分组成:内容损失使合成图像与内容图像在内容特征上接近,样式损失令合成图像与样式图像在样式特征上接近,而总变差损失则有助于减少合成图像中的噪点。
  • 可以通过预训练的卷积神经网络来抽取图像的特征,并通过最小化损失函数来不断更新合成图像。
  • 用格拉姆矩阵表达样式层输出的样式。

练习

  • 选择不同的内容和样式层,输出有什么变化?
  • 调整损失函数中的权值超参数,输出是否保留更多内容或减少更多噪点?
  • 替换实验中的内容图像和样式图像,你能创作出更有趣的合成图像吗?

参考文献

[1] Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2414-2423).

Published 17 original articles · won praise 1 · views 601

Guess you like

Origin blog.csdn.net/water19111213/article/details/104494065