DataWhale team camp task09-2 punch learning style image migration

Style migration
If you are a photography enthusiast, might come into contact with the filter. It can change the color style photos of landscapes so that it is like a sharper or more whitening. But usually only a filter change certain aspects of the photo. If you want to achieve the desired photo in the style, often you need to try a number of different combinations of complexity as much as the model parameter adjustment.

In this section, we will describe how to use convolution neural network to automatically apply a style image over another image, that migration patterns (style transfer) [1]. Here we need two input image, an image content, style is another image, we will use neural network to modify the content of the image to close style image in style. Content of the image pictured left image above is a landscape under the author in Seattle suburb of Mount Rainier National Park (Mount Rainier National Park) shot, style and image is a theme of autumn oak painting. The case of a composite image output in the final to retain the shape of the body content of the image object in the application of the painting stroke style images, but also to the overall color more vivid.
Here Insert Picture Description
The method of
FIG i.e., is an example to illustrate the method of the migration patterns of the convolutional neural network-based. First, we initialize the composite image, for example, it is initialized to the contents of the image. The composite image is the only variable migration patterns need to be updated, that the style migration model parameters required iterations. We then select a pre-convolution neural network trained to extract features of the image, which do not need to update the model parameters in training. Depth convolutional neural network with a plurality of layers of the image features extracted stepwise. We can select the output layer, as some of the content or style characteristic features. The following diagram, for example, pre-trained neural network selected herein contain three convolutional layers, wherein the second layer wherein the content of the output image and the output of the first and third layers as is characteristic of the image pattern. Next, we compute the migration loss function by forward propagation pattern (solid line arrow direction), and reverse spread (broken line arrow direction) iterative model parameters, i.e. continuously updated composite image. Style migrated common loss function consists of three parts: content loss (content loss) of the composite image with the content image close on the content features, style loss (style loss) so synthesized image pattern of the image close to the style characteristics, the total variation loss difference (total variation loss) helps to reduce noise in the composite image. Finally, when the end of the training model, we output the migration patterns of the model parameters, to obtain the final composite image.
Here Insert Picture Description
Below, we learn more about the technical details of style migration through experiments. Experimental need to use some imported packages or modules.

%matplotlib inline
import time
import torch
import torch.nn.functional as F
import torchvision
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image

import sys
sys.path.append("/home/kesci/input") 
import d2len9900 as d2l
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # 均已测试

print(device, torch.__version__)#cuda 1.1.0

Read the contents of the image and style image
First, we were reading the content of the image and style image. As can be seen from the printed image coordinate axes, their dimensions are not the same.

#d2l.set_figsize()
content_img = Image.open('/home/kesci/input/NeuralStyle5603/rainier.jpg')
plt.imshow(content_img);

Here Insert Picture Description

style_img = Image.open('/home/kesci/input/NeuralStyle5603/autumn_oak.jpg')
plt.imshow(style_img);

Here Insert Picture Description
Pre- and post-image
pre-processing and post-processing functions of the image function defined below. Preprocess the input image pre-processing functions are done in standardized RGB three channels, and the result is converted into a convolutional neural network accepts input format. After processing the pixel values of the output image will function postprocess reduced back to the value before standardization. Because the image print function requires a float value for each pixel between 0 and 1, we use the clamp function value is less than 0 and greater than 0 and 1 were taken 1.

rgb_mean = np.array([0.485, 0.456, 0.406])
rgb_std = np.array([0.229, 0.224, 0.225])

def preprocess(PIL_img, image_shape):
    process = torchvision.transforms.Compose([
        torchvision.transforms.Resize(image_shape),
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize(mean=rgb_mean, std=rgb_std)])

    return process(PIL_img).unsqueeze(dim = 0) # (batch_size, 3, H, W)

def postprocess(img_tensor):
    inv_normalize = torchvision.transforms.Normalize(
        mean= -rgb_mean / rgb_std,
        std= 1/rgb_std)
    to_PIL_image = torchvision.transforms.ToPILImage()
    return to_PIL_image(inv_normalize(img_tensor[0].cpu()).clamp(0, 1))

Feature Extraction
We used to extract image feature ImageNet VGG-19 model pre-trained data set based on [1].

!echo $TORCH_HOME # 将会把预训练好的模型下载到此处(没有输出的话默认是.cache/torch)
pretrained_net = torchvision.models.vgg19(pretrained=False)
pretrained_net.load_state_dict(torch.load('/home/kesci/input/vgg193427/vgg19-dcbb9e9d.pth'))

IncompatibleKeys(missing_keys=[], unexpected_keys=[])

In order to extract content feature pattern and characteristics of an image, we can select the output networks VGG some of the layers. Generally, the closer the output of the input layer more easily extracted image details, vice versa, the global information easily extracts the image. In order to avoid too much detail retained composite image content of the image, we choose layer closer VGG output, also known as the content layer to the output characteristics of the image content. We also choose different layers from the VGG output to match the local and global style, these layers also called layer styles. "Use repeating elements of the network (VGG)" section we have introduced, VGG network uses five convolution block. Experiments, we finally choose a convolution of the fourth layer as a convolution block content layer, and the first layer of each convolution block as convolution pattern layer. Index of these layers can be obtained by printing pretrained_net example.

style_layers, content_layers = [0, 5, 10, 19, 28], [25]

When extracting feature, we only need to use all the layers between the input layer VGG from the output layer to the closest content layer or pattern layer. The following build a new network net, it only retains all the layers need to use the VGG. We will use the net to extract features.

net_list = []
for i in range(max(content_layers + style_layers) + 1):
    net_list.append(pretrained_net.features[i])
net = torch.nn.Sequential(*net_list)

Given input X, a simple call to the front if calculated net (X), can only get the last layer output. Since we need to output an intermediate layer, layer by layer so here we calculate, and outputs the content layer and style retention layer.

def extract_features(X, content_layers, style_layers):
    contents = []
    styles = []
    for i in range(len(net)):
        X = net[i](X)
        if i in style_layers:
            styles.append(X)
        if i in content_layers:
            contents.append(X)
    return contents, styles

Two functions defined below, wherein the content of the image function get_contents content feature extraction, and is a function of the pattern of the image get_styles extracting pattern features. Because during training without changing the model parameters VGG pre-trained, so we can just extract the contents of the feature content of the image, style and features of the style image before training begins. Since the composite image is a pattern of iterations required to migrate the model parameters, we can only in the training process to extract the contents of features and style characteristics of the composite image by calling extract_features function.

def get_contents(image_shape, device):
    content_X = preprocess(content_img, image_shape).to(device)
    contents_Y, _ = extract_features(content_X, content_layers, style_layers)
    return content_X, contents_Y

def get_styles(image_shape, device):
    style_X = preprocess(style_img, image_shape).to(device)
    _, styles_Y = extract_features(style_X, content_layers, style_layers)
    return style_X, styles_Y

Defined loss function
Below we describe the loss of function-style migration. It consists of three parts content loss, loss of styles and total variation losses.

SUMMARY loss
loss function similar linear regression, and the difference in the composite image content image on the content feature content is measured by loss error function squared. Two squared error function input content output layer are calculated as a function of extract_features obtained.

def content_loss(Y_hat, Y):
    return F.mse_loss(Y_hat, Y)

Style loss
pattern loss as a measure of the difference image and the synthetic image pattern on the pattern by the error function squared. For expression pattern of the pattern layer output, the output style of the first layer, we calculated extract_features function. Suppose the number of samples of the output is 1, the channel number is c, and h is the height and width w, we can transform matrix output line c, respectively hw column X. Can be considered as a matrix X c is a vector of length hw X . 1 , ..., X c thereof. Wherein the vector X I represents the characteristic pattern on the channel i. These vectors Gram matrix (the Gram Matrix) XX T ∈R C × C element in row i and column j X ij of i.e. the vector X i and X j of the inner product, which expresses the relevant style features on the channel i and channel j sex. We use this expression pattern style Gram matrix layer output. Note that, when the value of hw large, Gram matrix are the large value easily arise. In addition, high and wide Gram matrix are all channels c. In order to make the loss of pattern size is not affected by these values, the following definitions will function Gram Gram matrix divided by the number of elements in the matrix, i.e., CHW.

def gram(X):
    num_channels, n = X.shape[1], X.shape[2] * X.shape[3]
    X = X.view(num_channels, n)
    return torch.matmul(X, X.t()) / (num_channels * n)

Gram matrix input two squared error function naturally, loss style pattern layer respectively output synthesized image based on the image pattern. It is assumed that the image based on the style of the Gram matrix gram_Y have a good pre-calculated.

def style_loss(Y_hat, gram_Y):
    return F.mse_loss(gram(Y_hat), gram_Y)

Loss of total variation
Sometimes, we learned there are a lot of high-frequency composite image noise, i.e., there are particularly dark or particularly bright pixel particles. A common method of noise reduction is noise reduction of the total variation (total variation denoising). Suppose X ij of a graph of (i, j) of the pixel value, to reduce the total loss is deteriorated
Here Insert Picture Description
can be made as similar as possible adjacent pixel values.

def tv_loss(Y_hat):
    return 0.5 * (F.l1_loss(Y_hat[:, :, 1:, :], Y_hat[:, :, :-1, :]) + 
                  F.l1_loss(Y_hat[:, :, :, 1:], Y_hat[:, :, :, :-1]))

Loss function
style migration loss function that is lost content, style, and loss of weight and loss of the total variation. By adjusting these parameters exceed the weights, we can weigh the content of the composite image retention, migration patterns and the relative importance of the three noise reduction.

content_weight, style_weight, tv_weight = 1, 1e3, 10

def compute_loss(X, contents_Y_hat, styles_Y_hat, contents_Y, styles_Y_gram):
    # 分别计算内容损失、样式损失和总变差损失
    contents_l = [content_loss(Y_hat, Y) * content_weight for Y_hat, Y in zip(
        contents_Y_hat, contents_Y)]
    styles_l = [style_loss(Y_hat, Y) * style_weight for Y_hat, Y in zip(
        styles_Y_hat, styles_Y_gram)]
    tv_l = tv_loss(X) * tv_weight
    # 对所有损失求和
    l = sum(styles_l) + sum(contents_l) + tv_l
    return contents_l, styles_l, tv_l, l

Create and initialize a composite image
in the style migration, the composite image is the only variable needs to be updated. Therefore, we can define a simple model GeneratedImage, and a composite image regarded as model parameters. Former model simply return to the model parameters can be calculated.

class GeneratedImage(torch.nn.Module):
    def __init__(self, img_shape):
        super(GeneratedImage, self).__init__()
        self.weight = torch.nn.Parameter(torch.rand(*img_shape))

    def forward(self):
        return self.weight

Below, we define get_inits function. This function creates an instance of the model of the composite image, and the image is initialized to X. Gram matrix pattern of the image in the style of each layer will styles_Y_gram precalculated before training.

def get_inits(X, device, lr, styles_Y):
    gen_img = GeneratedImage(X.shape).to(device)
    gen_img.weight.data = X.data
    optimizer = torch.optim.Adam(gen_img.parameters(), lr=lr)
    styles_Y_gram = [gram(Y) for Y in styles_Y]
    return gen_img(), styles_Y_gram, optimizer

Training
in the training model, we continue to extract the contents of features and style characteristics of the composite image, and calculate the loss function.

def train(X, contents_Y, styles_Y, device, lr, max_epochs, lr_decay_epoch):
    print("training on ", device)
    X, styles_Y_gram, optimizer = get_inits(X, device, lr, styles_Y)
    scheduler = torch.optim.lr_scheduler.StepLR(optimizer, lr_decay_epoch, gamma=0.1)
    for i in range(max_epochs):
        start = time.time()
        
        contents_Y_hat, styles_Y_hat = extract_features(
                X, content_layers, style_layers)
        contents_l, styles_l, tv_l, l = compute_loss(
                X, contents_Y_hat, styles_Y_hat, contents_Y, styles_Y_gram)
        
        optimizer.zero_grad()
        l.backward(retain_graph = True)
        optimizer.step()
        scheduler.step()
        
        if i % 50 == 0 and i != 0:
            print('epoch %3d, content loss %.2f, style loss %.2f, '
                  'TV loss %.2f, %.2f sec'
                  % (i, sum(contents_l).item(), sum(styles_l).item(), tv_l.item(),
                     time.time() - start))
    return X.detach()

Here we begin training model. First, the height and width of the image and the pattern of the image contents were adjusted to 150 and 225 pixels. A composite image by image content to initialize.

image_shape =  (150, 225)
net = net.to(device)
content_X, contents_Y = get_contents(image_shape, device)
style_X, styles_Y = get_styles(image_shape, device)
output = train(content_X, contents_Y, styles_Y, device, 0.01, 500, 200)

training on cuda
epoch 50, content loss 0.24, style loss 1.11, TV loss 1.33, 0.29 sec
epoch 100, content loss 0.24, style loss 0.81, TV loss 1.20, 0.29 sec
epoch 150, content loss 0.23, style loss 0.73, TV loss 1.12, 0.29 sec
epoch 200, content loss 0.23, style loss 0.68, TV loss 1.06, 0.29 sec
epoch 250, content loss 0.23, style loss 0.68, TV loss 1.05, 0.29 sec
epoch 300, content loss 0.23, style loss 0.67, TV loss 1.04, 0.29 sec
epoch 350, content loss 0.23, style loss 0.67, TV loss 1.04, 0.28 sec
epoch 400, content loss 0.23, style loss 0.67, TV loss 1.03, 0.29 sec
epoch 450, content loss 0.23, style loss 0.67, TV loss 1.03, 0.29 sec

Below we trained a composite image saved. Composite image can be seen in the figure below and retain the landscape image of the object content, while the migration pattern of the color image. Because the image size is small, so the details are still vague.

plt.imshow(postprocess(output));

Here Insert Picture Description
Here Insert Picture Description
In order to get a clearer image synthesis, let in more
training on size. We will be high and wide on FIG enlarged 2 times larger size to initialize the synthesized image.

image_shape = (300, 450)
_, content_Y = get_contents(image_shape, device)
_, style_Y = get_styles(image_shape, device)
X = preprocess(postprocess(output), image_shape).to(device)
big_output = train(X, content_Y, style_Y, device, 0.01, 500, 200)

training on cuda
epoch 50, content loss 0.34, style loss 0.63, TV loss 0.79, 0.91 sec
epoch 100, content loss 0.30, style loss 0.50, TV loss 0.74, 0.92 sec
epoch 150, content loss 0.29, style loss 0.46, TV loss 0.72, 0.92 sec
epoch 200, content loss 0.28, style loss 0.43, TV loss 0.70, 0.92 sec
epoch 250, content loss 0.27, style loss 0.43, TV loss 0.69, 0.92 sec
epoch 300, content loss 0.27, style loss 0.42, TV loss 0.69, 0.92 sec
epoch 350, content loss 0.27, style loss 0.42, TV loss 0.69, 0.93 sec
epoch 400, content loss 0.27, style loss 0.42, TV loss 0.69, 0.93 sec
epoch 450, content loss 0.27, style loss 0.42, TV loss 0.69, 0.93 sec

plt.imshow(postprocess(big_output));

Here Insert Picture Description
Here Insert Picture Description
It can be seen due to the larger image size, each iteration takes more time. Training can be seen from the figure obtained at this time because of the greater size of the composite image, it retains more detail. Which not only the composite image color painting style similar to the image block of the chunk, the color block even subtle texture.

summary

  • Migration patterns of common loss function consists of three parts: the loss of the contents of the composite image with the content image on the content characteristics close, so that the loss of style pattern of the image close to the synthesized image on the style characteristics, and can help reduce the loss of the total variation Synthesis noise in the image.
  • An image feature may be extracted by the convolutional neural network pre-trained, and to continuously update the composite image by minimizing the loss function.
  • Expression pattern with the output pattern layer Gram matrix.

Exercise

  • Select different content and style layers, output what changes?
  • Adjust the weight loss function of hyper-parameters, output whether to retain more or less more noise?
  • Replace the contents of the image and style image experiments, you can create more interesting composite image it?

参考文献
[1] Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2414-2423).

Published 31 original articles · won praise 0 · Views 794

Guess you like

Origin blog.csdn.net/qq_44750620/article/details/104516945