[Style Transfer]——A Neural Algorithm of Artistic Style

Style Transfer——A Neural Algorithm of Artistic Style
From:Journal of Vision(2015)

Abstract

People create a unique visual impact through the blending of content and drawing in painting, but the operating mechanism of this process is still unclear. And DNN has reached the recognition level close to human in other fields such as object and face recognition. Therefore, this article uses DNN to learn the content and style of the image separately to complete the creation of artistic image style transfer.

Section I Introduction

Convolutional neural network is a DNN widely used for image processing tasks. CNN contains layers of small computing units, which can process and learn visual information hierarchically in a feedforward manner. The image filters of each layer are responsible for understanding A certain level of image features, so that after each layer of convolution processing, information called feature maps will be output. When training a CNN network for object recognition, more and more complex features will be extracted as the processing level deepens, so the input image is transformed into a series of actual "content" after processing by the convolutional neural network instead of only representing pixel values . By visualizing the feature maps of each layer of CNN, we can see that the features extracted in the deep level of the network capture more advanced content information, while the low level often just repeats the pixel values ​​of the original image, so we will use the deeper features of the network The response is called content reoresentation; in

order to obtain the style representation of the input image, the texture information of the image needs to be captured. Therefore, a feature space is designed between the responses of the first filter in each layer of the network. The relationship represents the association in the feature map space, which is a static, multi-level description method. Note that the texture information is used instead of the overall layout.
Insert picture description here

It can be seen from Fig1 that after the original image is input to the network, the feature maps extracted by different filters in each layer are characterized, and as the number of filters increases, the size of each layer of feature maps will be reduced by downsampling. Content Reconstruction: Content Reconstruction a, b, and c represent the reconstruction of low-level featuremap. It can be seen that the original image is almost completely restored; but conv4_1 and conv5_1 lose the details of the image due to the loss of pixel information. Part of the style reconstruction: style information The texture features in the image space are obtained by capturing different levels of correlation, and ae respectively represents the style information extracted under different subspace combinations. Through the visualization results, it can be seen that the more the subset contains different levels, the more detailed information will be lost but the style is closer to the overall style of the original image.

The important finding of this paper is that in the convolutional neural network, the content representation and style representation of the image are separated from each other. Therefore, it is possible to manually manipulate the combination to generate a new meaningful picture combining two types of different content and styles. In order to better illustrate this discovery, this article selects a picture in the small town of Neckarfront in Germany as the content picture, and a series of famous art paintings as the style picture.

Insert picture description here

The final composite effect is the original building under different artistic styles. The Style in Fig2 is the style representation extracted after all the layers are collected, or you can choose the Style composed of a part of the layers. As shown in Fig3, the migration of the entire picture generally uses the style representation extracted from all levels, which will make the visual effect more Peaceful and continuous.

Of course, undirected content and style cannot be completely irrelevant. In the process of image generation, the two images are often not perfectly matched at the same time. The focus can be clarified by adjusting the loss function during the training process. For example, the emphasis on style learning will make the generated pictures have the texture information of artistic pictures but may not contain the content information of the photos; the emphasis on content learning will also lead to the mismatch of artistic styles. In actual operation, the two need to be traded off. .

Therefore, the deep learning framework proposed in this article separates the content and style of the image, so that the original image can be transferred to other styles on the basis of retaining the content. This article uses the DNN to do the pre-training framework for object recognition to obtain the image style Characterization and content characterization. This article is the first to realize the separation of natural image content and style characteristics.
Insert picture description here

Fig3 shows the effect of Style at different levels. It can be seen that the deeper the level is, the more complex the style is, because the complexity of the higher-level receptive fields and features has increased. The number at the top of each column represents the proportion of content/style. The larger the value, the more the content is emphasized.

The previous related research mainly used relatively simple small pictures such as landscapes, handwriting, and human faces. This article completed a series of artistic image-style renderings of real-life photos and did not directly operate on the pixel value, but through DNN Operations performed in feature space.

In addition, this article provides a concise, single-neuron-level description of Style Representation, mainly describing the relationship between different types of neurons.


 # The

 basic network of Section II Method is the one in VGG-19 The convolution kernel pooling layer does not use a fully connected layer. In the process of image generation, we found that replacing maximum pooling with mean pooling can help gradient flow.

 Generally, each layer in the network defines a set of nonlinear transformations. The input original image is encoded into a feature map of different specifications by the filter of each layer, and loss is defined as the mean square error of the original image and the generated image.
Insert picture description here

 The Style Representation is built at the top of each layer of response, and calculates the correlation of different filters, which is calculated by Gram Matrix, and Gram calculates the inner product.

 Gram Matrix Reference:
Gram Matrix

 achieves the purpose of style learning by optimizing the Gram Matrix of the two, and also introduces the wl weight factor to represent the weight of different layers.
Insert picture description hereThrough the joint optimization of Content Loss and Style Loss, the generated image can learn the content of the original photo and the style of the artistic image at the same time. The alpha and beta determine the weight of the two.
Insert picture description here

Guess you like

Origin blog.csdn.net/qq_37151108/article/details/107318706