Neural Style Transfer: A Review

Abstract

The work of Gatys et al. showed the fascinating aspect of CNN for artistic creation. This act of rendering images into different artistic styles is called Neural Style Transfer (NST). This article aims to summarize the research progress of NST. A qualitative/quantitative analysis of each method is performed on the basis of summarizing various NST algorithms. Refer to the related papers and source code:
Reference

Section I Introduction

Painting is a kind of artistic creation. For thousands of years, people have been dumped by some great artistic creations, such as Van Gogh's starry sky, but the reproduction of specific styles of paintings requires professionally trained painters; not only that, it also attracts computer scientists. Let’s explore how to generate the art of painting. Among them, non-photorealistic rendering (non-photorealistic rendering) is widely used in computer graphics, but most of them are designed for a specific artistic style; while style transfer solves a general problem, the image is transformed through the learning of texture features. Convert from the source domain to the target style domain. Early work by Hertzman et al. focused on extracting low-level features and sometimes not capturing image structure well. In recent years, with the strong learning ability of CNN, Gatys et al. took the lead in using CNN to complete the style transfer of natural images, extracting the content representation and style representation of the image with each pre-trained CNN, and optimizing the content loss function and style loss function iteratively. Complete the image style transfer. Fig1 shows the style of relocating the Great Wall into the traditional Chinese painting "Fuchun Mountain Dwelling". Gatys' work broke the previous CNN training required GT and a series of boundaries, and completed the migration of various styles, so it became a neural network style migration. A pioneering work in a field.
Insert picture description here
Subsequently, the researchers made a series of improvements based on Gatys' work. Whether in academia or industry, this article aims to provide an overview of the relevant development of NST (DDL to March 2018). The main work includes the following 3 aspects 
(1) Summarize NST related algorithms 
(2) Propose some evaluation methods/evaluation indicators between different NST algorithms 
(3) Summarize the challenges in the NST field and the directions that can be explored in the future.
 Section II reviews the unreal Related content of sexual rendering;
 Section III introduces the relevant foundation of NST;
 Section IV classifies and summarizes the NST algorithm;
 Section V is some improvements related to the basic algorithm;
 Section VI introduces the evaluation indicators related to NST;
 Section VII is the related application of NST;
 Section VIII is the challenge that NST faces, and finally Section VIIII summarizes the full text.

Section II Non-Photorealistic Rendering artistic stylization of light-sensitive application scenes has always been a research field. Before CNN appeared, it was mainly completed through NPR. For two-D images, image-based artictic rendering was mainly used. ,IB-AR), there are the following categories:

Stroke-Based Redenring: Match photos to a fixed style by setting different virtual strokes. The disadvantage is that the stroke is only designed for a certain style, which is less flexible. 
Region based technique: The region rendering method first divides the image into different regions, and uses different strokes for rendering in different semantic regions; the limitation of region based is that a single region cannot be rendered in any artistic style. 
Example-baed Rendering: The example-based method is based on image pairing for image analogy, and through supervised learning of the original image and the target style image to complete the style transformation, which is suitable for a variety of artistic styles; but in practical applications, paired training data are often It is not easy to obtain, and usually the low-level features of the image are extracted, and the content or style information cannot be effectively extracted, which limits the actual performance. 
Image Processed and Filtering: Image rendering can also be completed using different filtering, but only limited styles .
Based on the above discussion, it can be seen that although the IB-AR algorithm without the assistance of CNN can complete artistic rendering, it has many limitations in flexibility, style diversity, and feature extraction effectiveness. In order to solve the above problems, Neural Style Transfer is proposed.

Section III Basics of Style Transfer

In order to better understand the development of NST, it is necessary to introduce the origin of NST. In order to automate style transfer, the most important thing is how to model and extract style information from pictures. Because style and texture characteristics are closely related, we return to texture recognition to obtain style recognition; the next question is how to reconstruct the target style while retaining the image content, which requires Image Reconstruction.   How
  Part A Visual Texture Modeling 
performs texture synthesis requires image texture modeling. There are two methods: Parametric Texture Modelling with Summary Statistics based on statistical distribution and non-parametric texture modeling based on Markov random fields. Mode (Non-parametric Texture Modelling with Markov Random Fields)
  Parametric Texture Modelling with Summary Statistics : First proposed by Jules, the texture feature of an image is regarded as the N-order statistics of pixels, while Gatys is the first to use CNN to model. The Gram matrix is calculated to characterize the texture model. The encoding is the second-order information
   Insert picture description here
. The GramMatrix calculation of the information extracted by CNN can complete the modeling of natural or unnatural texture features. However, Gram calculation is a global layout that is not conducive to capturing long-range Some scholars have improved the symmetry relationship of dependence to solve the symmetry problem by flipping the feature map in both the horizontal and vertical directions.
Non-parametric Texture Modelling with MRFs : Non-parametric modeling considers that the pixel values in the texture image are related to their neighbors. Based on the above assumptions, Leung proposed assigning values to pixels in the original image by searching for neighboring pixel values.
Part B Image Reconstruction
Many vision tasks need to extract abstract features from the input image. Image reconstruction is the opposite. It explores how to restore the input image based on the extracted features. Therefore, it is necessary to understand what content and information the extracted abstract features contain. The algorithms for image restoration based on the features extracted by CNN mainly include:
Image Optimization Based On-line Image Reconstruction (IOB-IR) and
Model-Optimization Based Offline Reconstruction (MOB-IR).
IOB-IR will iteratively optimize the entire image until the generated feature expression is similar to the original feature expression, but this method takes a long time to restore larger images;
MOB-IR will train the feedforward network in advance and then put the calculations In the training phase, the test phase completes the reverse to improve efficiency, and can also be combined with GAN to improve performance.

Section IV Neural Style Transfer Algorithm

Neural Style Transfer is one of the aforementioned IB-AR methods. This chapter mainly introduces 2D image-related style transfer algorithms, their characteristics, and limitations.
A big uncertainty in style transfer lies in the definition of "style". How to evaluate the success of style transfer by an algorithm, paying more attention to details? Semantic information? And so on will affect the evaluation of the algorithm.
This paper divides the NST algorithm into two categories: IOB-NST and MOB-NST.
IOB-NST: Iteratively optimizes the image to complete the style transfer [Slow];
MOB-NST: Generates a model through offline optimization [Fast], only one pass before The feed process completes the generation of style images. See Fig2 for details.
Insert picture description here Part A IOB-NST 
Deepdream has laid the foundation for IOB-NST related research. The basic idea of IOB-IR is to first extract the features of style image and content image, and combine the two to obtain the target feature description; then iteratively optimize the reconstructed image. The limitation of IOB-NST lies in the high computational cost caused by iterative optimization.
** 1. Parametric Neural Methods with Summary Statistics Gatys ** The
image is re-migrated through the features extracted from the VGG-19 middle layer. It belongs to the parametric image migration algorithm based on statistical distribution. It is made by optimizing the content loss function and style loss function. The final reconstructed image has the characteristics of both.
Insert picture description here By adjusting the alpha and beta weighting factors, you can also adjust whether the focus is more on content or style.
In the process of style migration, the choice of content and style levels is very important. Choosing different layers and different numbers of filters in each layer will have a great influence on the final result. In the work of Gatys, style selected {relu1_1,relu12_1,relu3_1,relu4_1,relu5_1}, and we can see that multi-level information is selected in the style content. This is the key to the success of Gatys' work, making the final style smoother and more continuous ; Content selects higher-level relu4_2 instead of low-level information, which often contains many details, but sometimes it is necessary to modify the content of specific content in order to be close in style. Gatys' algorithm does not require ground truth and does not have too many style image type constraints when training, which is in sharp contrast with the previous IB-AR algorithm.
However, Gatys' algorithm still has certain limitations. It does not retain the detailed information of the picture, because CNN inevitably discards some of the underlying information, and Gram Matrix as a feature expression is not suitable for real rendering, and it has not considered the content of the picture. These are all important factors that affect the presentation effect.
In addition to Gram Matrix, there are other methods of texture characterization, such as Domain Adaption. The training and test data of domain migration are often divided into different distributions. By learning the labeled data of the source domain, predict the unlabeled data of the target domain, and minimize the Maximum Mean Discrepancy (MMD) to make the data distribution of the target domain and the source domain. Establish a mapping relationship. In NST, the MMD of the two domains of the style map and the reconstruction map is minimized, so that the feature distribution of the reconstruction map is as close as possible to the distribution of the style map, so as to achieve the effect of style transfer.
Insert picture description here However, the limitation of GramMatrix lies in the instability of training, which requires careful and cumbersome tuning. Because some scholars have found that there may still be the same Gram matrix for different feature activations, Risser et al. proposed optimization based on feature histogram statistics. .
By matching feature activated histograms, training can be made more stable, and the number of iterations is also reduced, at the cost of further increase in computational overhead. In addition, the Gatys algorithm does not consider the depth and details of the problem and has not improved.
The above-mentioned CNN-based algorithm inevitably loses detailed information, and sometimes causes distortion of the image or structure. Some scholars introduce additional constraints to retain low-level information, so that the original image details can be better preserved while the style is transferred.
2. Non-parametric Neural Methods with MRFs 
The non-parametric method based on MRFs completes the style transfer by cutting patches and matching methods. By cutting the reconstructed image into patches, each patch is optimized and approximated, so that local details can be better preserved information. This type of algorithm is more suitable for situations where content and style are similar.
Part B MOB-NST 
The limiting factor of IOB-NST is that it is not efficient and consumes a lot of computing resources. However, MOB-NST trains the network offline, and actually uses only the feedforward network to complete the image reconstruction, so that the pre-trained network can solve the problem of large amount of calculation and low efficiency. According to the types of styles that can be migrated, it can be further subdivided into:
PerStylePerModel,
MultipleStylePerModel and
ArbitraryStylePerModel. 
1.Per-Style-Per-Model (PSPM NST)
 In PSPM, one network is pre-trained each time, and the reconstructed picture is obtained after the picture passes through the feedforward network during the test. The difference between the two early attempts is only the difference in network structure. The biggest advantage is that it can meet real-time style transfer. Research has also found that adding BN will significantly improve the style transfer effect and converge faster. This operation of normalizing a single image is called instance normalization (IN), which is actually equivalent to batch=1 BN. One possible explanation is that IN is a form of style normalization, which can directly normalize each content image to a certain style, and other parts of the network are used to optimize contentloss.
 There is also a non-parametric PSPM algorithm that selects the reconstructed image with the best consistency in texture features through the strategy of splitting the generated image into a patch, but this algorithm does not work well for some images with unobvious texture features (such as facial images). good.
 2. Multiple-Style-Per-Model (MSPM NST) The
 above PSPM model has improved the running time of IOB-NST class methods by two orders of magnitude, but only one style of migration can be carried out at a time, which is extremely inflexible. Multi-style transfer also needs to train many redundant networks, so MSPM came into being to integrate multiple style models into one network. There are two implementation ideas: one is to bind a few parameters in each style domain network, and training of different styles is to train related parameters; the other is to use both style and content as the network enter.
2.1 Tying only a small number of parameters to each style.
Related work is the proposal of IN and Ref[53]. Dumoulin et al. found that for the same layer of convolution parameters, only the IN layer parameters need to be scaled or translated to achieve different styles. For this reason, they proposed the conditional instance normalization (CIN) algorithm:
Insert picture description here where F is the feature activation value, and s is the index corresponding to a certain style, and different styles of modeling can be achieved through different degrees of scaling.
Chen's StyleBank related work also reflects this idea, decoupling the content of the style domain, using different networks to learn the information of each department, and binding some layers in the style network to a certain style, learning separately, binding The relevant layer is called "StyleBank"; this method can also fix the content part and train the relevant layer separately for the style part.
The above two implementations of MSPM can complete different styles of learning and are more efficient, but they still do not solve the limitation of NST: the lack of participation of deep Western and semantic information.
2.2 Combining both style and content as inputs
The limitation of the 2.1 method is that with the increase of style types, the scale of the network will become larger and larger. Therefore, Combining both style and content as inputs will do its best by outputting both content information and style information. It is possible to tap the learning ability of a single network to complete the style transfer. Therefore, it can be seen that the difference between the two methods is how to incorporate style information into the network.
[55] For N style types, a selection unit is designed to determine the effect of stylization. The initial input style vector is randomly sampled from the style distribution, and the content vector is used as the input of Wang Luodan, and finally the network is encoded and decoded. Get the final stylized image. This effectively solves the problem of excessive network scale caused by multiple styles.
2.3 Arbitrary Style Per Model (ASPM NST)
 ASPM can realize the migration of any style through a single network, and there are two realizations based on non-parameterization and parameterization. 
 **(1) Non-parametric ASPM**
  [57] Cut the content and style activation maps of some layers in the pre-trained VGG network into a series of patches, and then exchange the closest content patch and style patch, which is called (style swap), and then reconstruct the image after the exchange. However, the final implementation effect of this algorithm is somewhat unsatisfactory. It is believed that there is still a big gap between the exchanged styleswap and the target style, and only the content aspect has been better retained. 
  **(2) Parametric ASPM**
The easiest way to transfer any style is to train a network P separately to predict the style parameters. Once the parameters of the style transfer are determined, the transfer of this style can be performed, but a lot of training is required Data participation.

Section V Improvements and Extensions

The improvement and extension of the NST algorithm are developed from the following aspects: controlling style related parameters, specific types of style transfer (graffiti, portrait, video stream) and even audio.
Part A Controlling Perceptual Factors
Gatys further extended its own work and proposed a style space control strategy to control the styles of different regions. Guidance channel was introduced to determine which style to migrate to which region through the [0,1] value. ;
The control of stroke size is more complicated, and different sizes will make the final stylization effect quite different.
For IOB-NST, different stroke sizes can only be achieved by scaling the style image; for MOB-NST, the original image can be scaled to different sizes before inputting the network, or style images of different scales can be used during training; while ASPM inevitably needs to be Trade off between speed and quality.
Jing [61] et al. realized for the first time that a single network can adjust multiple stroke sizes, which better solves the problems of efficiency and image quality.
For images with higher resolution, it is not enough for IOB-NAT to just enlarge the style image. It is more common to pass through a series of sub-networks through a coarse-to-fine strategy, and each sub-network accepts the previous upsampling. The stylized output is gradually getting fine stylized images.
The implementation method of MOB-NST is similar to IOB.
Part B Semantic Style Transfer
For two content images and style images that are relatively similar in content, semantic style transfer is mainly to establish a semantic relationship between the two, and to match each style area with a content image with similar semantics, thus completing similar semantic content Regional style conversion. There are two main types, one is based on images, which need to provide annotation information or the region is divided through segmentation networks; the second is based on models, but efficiency is a major limiting factor.
Part C **
Instance Style Transfer Instance Style Transfer is
based on instance segmentation. The result of a certain instance segmentation is transferred in style. The difficulty lies in how to deal with the edges of the background that does not require stylization, such as adding an extra loss smooth boundary.
Doodle Style Transfer **
Another interesting application of NST is to turn graffiti sketches into refined artistic styles, and replace content loss with doodle segmentation results.
Stereoscopic Style Transfer
In order to realize AR/VR, some researchers have realized the style transfer of the stereo effect. [72] proposed that the disparity loss can be stylized according to different perspectives. If
Portrait Style Transfer
directly uses the existing NST algorithm for portrait stylization, it will cause facial distortion and distortion. [73] adds spatial constraints to stylize while preserving facial structure. The NST of
Video Style Transfer
video stream was not proposed after Gatys proposed the NST algorithm for still images, but the difference lies in the need to naturally transition between adjacent frames in a smoother way. There are also two types: Image-based and Image-based The realization method of Model. (1) Image Optimisation based Online Video Style Transfer
 Ruder is the first to realize the style transfer of the video stream based on the optical flow method, and introduces the temporal consistency loss function (temporal consistency loss) to make the transition of different frames stylized smoother, but it takes several minutes to complete the stylization of one frame . 
 (2) Model Optimisation based Offline Video Style Transfer 
 Huang[78] et al. implemented based on PSPM. Two consecutive frames of images are sent to the style transfer network separately, and the resulting output is calculated for consistency loss to ensure pixel level consistency; [80] uses a sub-network to generate features, which is combined with optical flow information and sent to the style transfer network of the codec structure.
 
  The style transfer of  Character Style Transfer letters is to produce new fonts or text effects. The latest research is based on cGAN to complete the prediction of pictographs, and then use a modification network to complete the prediction of colors and textures. Joint training of the two networks.
  Photorealistic Style Transfer
aims to transfer the color distribution and retain all the content information of the original image without introducing distortion. It is still divided into Image Based and Model Based.
IOB: [84] and others first used a two-stage network. The first stage completed the style transfer and the second stage removed the distortion information, but the computational cost was very high.
MOB: [86] is also divided into two steps: stylisation steo and smoothing step. In order to improve efficiency, the upsampling layer in [59] NST is replaced with an unpooling layer, and the smoothing step further removes distortion and artifacts.
Fashion Style Transfer
Style transfer aims to stylize a series of clothing into different popular styles, and render the clothes into a customized popular style on the basis of preserving the details of the clothes. [89] proposed to achieve this through a series of popular style generators and discriminators. One task. In addition to the style transfer of images,
Audio Style Transfer
can also synthesize specific sound effects. It also follows two implementation routes of style transfer: Audio-based and Model Based.

Section VI Evaluation Methodology

The migration effect of NST still does not have a unified evaluation index. It is an open question, which is mainly evaluated from two aspects: migration quality and migration efficiency. The quality of migration usually comes from the observer's evaluation, and therefore is related to subjective factors such as the observer's age and occupation; while the efficiency can be evaluated in terms of time complexity and loss of diversity through a clear evaluation matrix.
Part A Dataset
selects 10 style images and 20 types of content images. The artistic style of Style Image covers impressionism, cubism, abstraction, modernism, surrealism, expressionism, etc. Fig4 shows some examples, some are flowers on canvas, and some are painted on cardboard or polyester materials.
For the content image, the benchmark-NPReneral selected in the work of Mould&Rosin is selected; for the training data of the Off-line model, the MS-COCO data set is used for the pre-training of the model.
Insert picture description here
In order to make the comparison more fair, it is based on the following principles:
(1) Almost exactly according to the original experimental settings of each algorithm;
(2) But the weight of content and weight in different algorithms will significantly affect the final stylization effect, this article is at the best The effect is to choose the respective weights of different algorithms
(3) Use the original experimental parameters and settings of each algorithm as much as possible; for the details of the characteristic algorithm, there are:
Insert picture description here Part B Qualitative Evaluation
Fig5, Fig7, Fig9 visualize some of the stylized results. Fig.5 shows the results of the IOB-NST algorithm and some of the PSPM-MOB-NST algorithms. The content image is from [92] [93]. Fig7 shows the stylized results of the MSPM-MOB-NST algorithm, and Fig9 is the ASPM-MOB. -The result of the NST algorithm.
Insert picture description here IOB-NST conducts style transfer online, which has a higher computational cost, but the actual stylization effect is better. Generally, the result of Gatys algorithm is used as the gold standard. Each model in PSPM is only suitable for one style of transfer. It can be seen that the transfer results of Ulyanov and Johnson are visually similar, while the transfer effects of Li and Wang are slightly weaker. Although GAN training has increased instability to a certain extent, this article believes that GAN is still a very promising direction and means to realize NST.

Insert picture description here
Fig7MSPM can realize the migration of multiple styles through a network. For example, in Dumoulin and Chen's work, each style sets a similar number of parameters, so the effect of money is similar; but the problem is also clear in the previous article. Network size. It can be seen that the ASPM result is not as realistic as the previous migration result, but it is excusable. After all, it is the result of weighing speed, flexibility, and image quality. In any style generation, Chen and Schmidt are implemented based on patch, which seems to be insufficient for style information; Ghiasi is a data-driven algorithm, so the migration effect depends largely on the amount of data during training for each style; Huang and Belong's The algorithm is based on statistical information and therefore has the best visual effect, but it seems to be insufficient for complex patterns.
Saliency Compparison
significant contrast
NST is a process of artistic creation, and for the definition of style, very complex subjective, and some observers believe could generate some kind of style is very successful and some evaluation is the opposite, but the goal of this paper is as objective as possible Comparative analysis of various algorithms, so this article decided to conduct a comparative analysis based on the saliency maps of different algorithms, IOB-NST\PSPM-MOB-NST\MSPM-MOB-NST respectively correspond to Fig6\Fig8\Fig10. Through the saliency map The analysis shows that the MSPM algorithm has better significant consistency.
Insert picture description here
The Quantitative Evaluation of Part C Quantitative Evaluation
focuses on the following five indicators: the generation time of a single content map with different specifications, the training time of a single model, the average loss of the content image (a measure of how the loss function is optimized), and the loss change during the training process ( Reflect the convergence of the model), the scalability of the style map.
(1) Stylisation speed
Mainly to evaluate the efficiency of the MOB series of algorithms, Table II shows the average time required to generate 100 stylized images of three different resolutions (256, 512, 1024). The fifth column shows how many styles each algorithm can generate. .
Insert picture description here It can be seen that in addition to [57] [59] other MOB-NST algorithms can meet the real-time requirements, and ASPM is often slower than PSPM and MSPM, which is also the result of weighing the three factors mentioned above.
(2)
The training time of the Training Time model is also very important, but the training time of different algorithms is very difficult to compare. Some models can achieve better generation results after several iterations. Therefore, the comparison of training time in this article can only be used as a guide. Kind of reference. The detailed information is:
Johnson[47] 3.5h
Ulyanov[48 ] 3.0h
Li[52] 2.0h
Chen[57]/[51] is longer
(3) Loss comparison
can see the convergence of the network through the change of loss curve :
Insert picture description here
(4) Style scalability The scalability of
style migration is also an important consideration for MSPM, but it is not easy to measure, because the models of different algorithms are only related to several styles. For details, see Table III.
E, AS, LF correspond to each other For Efficient, Arbitrary Style and Learning-Free, the results of Gatys are usually used as the gold standard for generation, while PSPM compares [4][50][52]. The general model of MSPM algorithms is relatively large, and the effect of ASPM generation is not as good. PSPM and MSPM are realistic.

Section VII Applications

One of the reasons why Part A Social Communication style transfer is so favored by academia and industry is its popularity on social networking sites, such as Facebook and Twitter. Recently, a new mobile application, Prisma, has a good stylization effect around the world. It is highly sought after, and it also brings profits to some manufacturers. For example, users of Ostagram can speed up and accelerate the acquisition of stylized images after paying.
The application of NST in social networks has in turn promoted the advancement of related algorithms. For example, users’ feedback on using NST can help further improve the algorithm. In order to meet real-time requirements, FAIR has developed a new embedded deep learning framework Cafe2Go And Caffe2.
Part B User-Assisten Creation Tools
NST can also be used as a user-assisted creation tool, which can provide designers and architects with stylized image-assisted design, reduce workload and improve work efficiency.
Part C Production Tools for Entertainment Applications
NST can also be used to assist the creation of film and television entertainment works. For example, animation creation requires 8-24 frames per second. If you use NST to automatically generate animation styles, it will greatly reduce the cost of creation and time; it can also be used for movies, game. There have been successful attempts in this regard, such as [106] for 3D rendering.

Section VIII Future Challenges

The development and industry applications of NST related algorithms show the broad development prospects of NST, but NST still has some challenges and issues worthy of discussion. As well as the directional correlation between NST and NPR, there are some common limitations. Therefore, this section will first discuss the common limitations of the two, and then discuss the problems that NST itself needs to solve.
Part A Evaluation Methodology
's evaluation of aesthetic and artistic effects is very important in the fields of NPR and NST. Scholars need to evaluate their proposed algorithms and their performance based on some reliable and credible indicators. However, most of the current evaluations of related work are subjective. For example, [59] uses the results of audience voting, but this is not the best evaluation method, because there are a thousand Hamlets in the eyes of a thousand people.
This article also conducted a user survey and selected 4 men and 4 women of the same age and occupation to rate the stylized results. We found that even people with similar backgrounds rated and judged differently. Therefore, how to judge the stylized image is still an open question, and it needs to be judged jointly by professional institutions and related art experts.
NST does not yet have a standard benchmark. For example, the content image selected in this article is NPRgeneral, but the style image industry has not unified the data as the benchmark because there is indeed no style type requirement.
Based on the above discussion, the standardization of NST is indeed an important and worthy of further exploration.
Part B The interpretability of neural network style transfer is the
same as other algorithms treat neural networks as a black box. The interpretability of NST algorithm is also very necessary, mainly focusing on: mutual decoupling feature expression, NST normalization method and Some examples of counterattacks from NST.
(1) Representation disentangling
It mainly explores the interpretability of feature expressions in different dimensions, and changes the influence of a certain parameter on other parameters, which is also helpful for the exploration of machine learning and transfer learning. For example, clarifying the relationship between parameters and color, shape, and stroke size can precisely control the effect of stylization. At present, there are two types of methods, supervised and unsupervised. Supervised methods require the participation of labeling information. Unsupervised methods do not require labeling information. Participation but the interpretability of features is often unsatisfactory.
Therefore, it is also a research direction to clarify which parameters in NST control the stylization effect.
(2) Normalization
method The evolution and development of normalization method also significantly affect the effect of NST. Table 4 lists some normalization methods. The first one used for NST is instance normalization, which is equivalent to BN of batch=1. It can make the network converge faster and the stylization effect is better. One explanation is that instance normalization makes the network discard contradictory information in content, thereby simplifying learning. With IN, any image can be migrated to a specific style, and the rest of the network only needs to pay attention to content loss.
Subsequently, Dumoulin et al. further proposed conditional instance normalization, which can scale and shift the parameters of the instance noemalization layer, and further adjust the generated style, that is, the generation of any style. Similarly, there is adaptive instance normalization.
But the mechanism behind it has not been clarified yet.
Insert picture description here (3) Adversarial samples
At present, studies have found that some classification networks are easily deceived by adversarial samples, leading to incorrect results. As shown in Fig.14, just adding a small disturbance to the original image will make it difficult for the network to recognize the correct result. It shows that there is a big difference between human visual cognition and network cognition. Therefore, learning and researching NST's adversarial sample pair The problem of stylized migration is also very necessary.
Insert picture description here Part C Three-way Trade-Off
The NST algorithm often requires a trade-off between speed, flexibility and quality. Among them, IOB-NSTquality is the best but consumes huge computational cost;
PSPM-MOB-NST can meet the needs of real-time stylization but requires a separate network for each style, which has poor flexibility;
MSPM-MOB-NST combines multiple styles It is integrated into a network but requires the participation of a pre-trained network;
ASPM-MOB-NST can realize the transfer of any style, but currently it is still not satisfactory in terms of quality and speed, and is heavily dependent on the diversity of style training data.
In addition, the setting of hyper-parameters is currently based on experience. For different content and style combinations, it is necessary to reset the super-production slowly and fine-tune, which is very time-consuming. Therefore, it is necessary to continue research on NST optimization in order to find the best local style. Optimize the quality of the image.

Section VIII Discussion and Conclusion

Table 5 summarizes the algorithms of NST, which can be seen in conjunction with the classification diagram of Fig2 above.
Insert picture description here After the dedication of countless scholars in recent years, NST has made progress and development. At present, the primary focus of NST is how to better migrate various styles. There are two main technical directions: one is to reduce the examples of migration failures, and to improve the proportion and quality of successful examples of various styles; the other is based on the existing NST algorithm Extend more variants, such as studying the stylization of 3D surfaces.
In addition to the NST algorithm, in addition to just imitating the various artistic styles created by humans, it can also explore AI for artistic creation and explore the possibility of more style combinations.

[Style Transfer]——Neural Style Transfer: A Review