Project Summary 4: Art generation with Neural Style Transfer

1. Project introduction

Neural Style Transfer (NST) is one of the most interesting techniques in deep learning. It merges two images, namely content image C (content image) and style image S (style image), to generate image G (generated image). The resulting image G combines the content of image C with the style of image S.

 

2. Model

Using the technique of transfer learning, the model adopts the pre-trained VGG19 network. The pretrained model is from MatConvNet.  http://www.vlfeat.org/matconvnet/pretrained/  . The model structure is as follows:

(1) Example diagram of model structure:

(2) The structure of the VGG19 network used in this project

{'input': <tf.Variable 'Variable:0' shape=(1, 300, 400, 3) dtype=float32_ref>,
 'conv1_1': <tf.Tensor 'Relu:0' shape=(1, 300, 400, 64) dtype=float32>,
 'conv1_2': <tf.Tensor 'Relu_1:0' shape=(1, 300, 400, 64) dtype=float32>,
 'avgpool1': <tf.Tensor 'AvgPool:0' shape=(1, 150, 200, 64) dtype=float32>,
 'conv2_1': <tf.Tensor 'Relu_2:0' shape=(1, 150, 200, 128) dtype=float32>,
 'conv2_2': <tf.Tensor 'Relu_3:0' shape=(1, 150, 200, 128) dtype=float32>,
 'avgpool2': <tf.Tensor 'AvgPool_1:0' shape=(1, 75, 100, 128) dtype=float32>,
 'conv3_1': <tf.Tensor 'Relu_4:0' shape=(1, 75, 100, 256) dtype=float32>,
 'conv3_2': <tf.Tensor 'Relu_5:0' shape=(1, 75, 100, 256) dtype=float32>,
 'conv3_3': <tf.Tensor 'Relu_6:0' shape=(1, 75, 100, 256) dtype=float32>,
 'conv3_4': <tf.Tensor 'Relu_7:0' shape=(1, 75, 100, 256) dtype=float32>,
 'avgpool3': <tf.Tensor 'AvgPool_2:0' shape=(1, 38, 50, 256) dtype=float32>,
 'conv4_1': <tf.Tensor 'Relu_8:0' shape=(1, 38, 50, 512) dtype=float32>,
 'conv4_2': <tf.Tensor 'Relu_9:0' shape=(1, 38, 50, 512) dtype=float32>,
 'conv4_3': <tf.Tensor 'Relu_10:0' shape=(1, 38, 50, 512) dtype=float32>,
 'conv4_4': <tf.Tensor 'Relu_11:0' shape=(1, 38, 50, 512) dtype=float32>,
 'avgpool4': <tf.Tensor 'AvgPool_3:0' shape=(1, 19, 25, 512) dtype=float32>,
 'conv5_1': <tf.Tensor 'Relu_12:0' shape=(1, 19, 25, 512) dtype=float32>,
 'conv5_2': <tf.Tensor 'Relu_13:0' shape=(1, 19, 25, 512) dtype=float32>,
 'conv5_3': <tf.Tensor 'Relu_14:0' shape=(1, 19, 25, 512) dtype=float32>,
 'conv5_4': <tf.Tensor 'Relu_15:0' shape=(1, 19, 25, 512) dtype=float32>,
 'avgpool5': <tf.Tensor 'AvgPool_4:0' shape=(1, 10, 13, 512) dtype=float32>}

  

3. Cost function

(1) Content cost function

  • First, expand the image from 3D volume to 2D matrix, as shown below:

  • Calculate the content cost function. When two pictures G and S are used as input, if the activation value of a certain layer of the neural network is similar, it means that the content of the two pictures is similar.

   

(2) Style cost function

  • First calculate the Gram matrix of a layer:

     

  • Compute the style cost function. When two pictures of G and S are used as input, if the correlation coefficient between the activation values ​​of each channel of a certain layer of the neural network is high, it means that the content of the two pictures is similar.

     

  • In fact, if you use the style cost function for each layer, the result will be better. Calculated as follows:

      

  • Combining the content cost function and the style cost function together yields the cost function:

       

 

4. Model optimization algorithm and training objectives

# define optimizer (1 line)
optimizer = tf.train.AdamOptimizer(2.0)
 
# define train_step (1 line)
train_step = optimizer.minimize(J)

 

5. Input and output data

  • Input data: content_image, style_image, generated_image
  • Output data: generated_image

 

6. Summary

  • Neural Style Transfer is an algorithm that given a content image C and a style image S can generate an artistic image
  • It uses representations (hidden layer activations) based on a pretrained ConvNet.
  • The content cost function is computed using one hidden layer's activations.
  • The style cost function for one layer is computed using the Gram matrix of that layer's activations. The overall style cost function is obtained using several hidden layers.
  • Optimizing the total cost function results in synthesizing new images.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324687959&siteId=291194637