Bishe deep learning image style transfer

0 Introduction

Today, the senior will introduce a machine vision project to you

Flower Recognition Based on Deep Learning Convolutional Neural Network

Picture style migration refers to converting the style of one picture to another picture, as shown in the figure:

insert image description here
After a series of feature transformations, the original image has new texture features, which is called style transfer.

1 VGG network

Before implementing style migration, you need to briefly understand the VGG network (because the VGG network continues to use the network structure of convolutional extraction features and accurate image recognition efficiency, here we use the VGG network to perform image style migration).

insert image description here
As shown in the figure above, each column of AE represents the structural principle of the VGG network, which are: VGG-11, VGG-13, VGG-16, VGG-19, as shown in the figure below, a picture passes through the VGG-19 network The structure can end up with a classification structure.

insert image description here

2 Style Migration

To transfer the style of an image, there are two things that need to be clear.

  • The generated image needs to have the content characteristics of the original image
  • The generated image needs to have the texture characteristics of the style picture

According to these two points, it can be determined that in order to achieve style transfer, two loss values ​​are required:
one is the loss of the content features of the generated image and the content features of the original image, and the other is the texture feature of the generated image and the texture of the style image The loss of the feature.

To extract different features (content features and texture features) from a picture, you only need to use different convolution structures for training. At this point we need to use two neural networks.

Going back to the VGG network, the VGG network continuously uses the convolutional layer to extract features, and uses the features to classify items, so the parameters of the extracted content and texture features in the network can be migrated. Therefore, it is necessary to extract the features of the generated pictures through the VGG network, and then calculate the loss of the features for the content and texture respectively.

insert image description here
As shown in the figure, assuming that the initialization image x (Input image) is a random image, we generate it through the fw (image Transform Net) network to generate image y.
At this time, y needs to calculate the characteristics of the style image ys to obtain a loss_style, and calculate the characteristics of the content image yc to obtain a loss_content. Assuming loss=loss_style+loss_content, the network parameters of fw can be trained.

Now you can look at a picture that is very common on the Internet:

insert image description here
Compared with the first picture I drew, this is the refinement of the loss evaluation process in VGG.

The results of refinement can be divided into two aspects:

  • (1) Content loss
  • (2) Style loss

3 Loss of content

Since the model used in the above figure is VGG-16, it is equivalent to calculating the loss of the features obtained from the two pictures at the relu3-3 of VGG-16. The calculation function is as follows:

insert image description here

In short, assuming that the feature matrix obtained by yc is φ(y), the feature matrix obtained by generating the picture is φ(y^), and c=φ.channel, w=φ.weight, h=φ.height, Then there are:

insert image description here

Code:

def content_loss(content_img, rand_img):
    content_layers = [('relu3_3', 1.0)]
    content_loss = 0.0
    # 逐个取出衡量内容损失的vgg层名称及对应权重
    for layer_name, weight in content_layers:

        # 计算特征矩阵
        p = get_vgg(content_img, layer_name)
        x = get_vgg(rand_img, layer_name)
        # 长x宽xchannel
        M = p.shape[1] * p.shape[2] * p.shape[3]

        # 根据公式计算损失,并进行累加
        content_loss += (1.0 / M) * tf.reduce_sum(tf.pow(p - x, 2)) * weight

    # 将损失对层数取平均
    content_loss /= len(content_layers)
    return content_loss

4 style loss

The style loss is calculated by multiple features, first you need to calculate the Gram Matrix

insert image description here
Gram Matrix can actually be seen as an eccentric covariance matrix between features (that is, a covariance matrix without subtracting the mean). In the feature map, each number comes from the convolution of a specific filter at a specific position. Therefore, each number represents the strength of a feature, and what Gram calculates is actually the correlation between two features, which two features appear at the same time, which two are ebb and flow, etc. At the same time, The diagonal elements of Gram also reflect the amount of each feature appearing in the image. Therefore, Gram helps to grasp the general style of the entire image. With the Gram Matrix representing the style, to measure the difference between two image styles, you only need to compare the difference between their Gram Matrix. Therefore, when calculating the loss, the function is as follows:

insert image description here
In actual use, the level of the loss generally selects multiple layers from low to high, such as the 2nd, 4th, 7th, and 10th convolutional layers in VGG16, and then adds the style loss of each layer.

insert image description here
The third part is not required and is called Total Variation Loss. It is actually a smoothing term (a regularization term), the purpose of which is to make the generated image as smooth as possible locally, and its definition is very similar to the smoothing term used in Markov Random Field (MRF). where yn+1 is the neighboring pixel of yn.

The code implements the above functions:

# 求gamm矩阵
def gram(x, size, deep):
    x = tf.reshape(x, (size, deep))
    g = tf.matmul(tf.transpose(x), x)
    return g

def style_loss(style_img, rand_img):
    style_layers = [('relu1_2', 0.25), ('relu2_2', 0.25), ('relu3_3', 0.25), ('reluv4_3', 0.25)]
    style_loss = 0.0
    # 逐个取出衡量风格损失的vgg层名称及对应权重
    for layer_name, weight in style_layers:

        # 计算特征矩阵
        a = get_vgg(style_img, layer_name)
        x = get_vgg(rand_img, layer_name)

        # 长x宽
        M = a.shape[1] * a.shape[2]
        N = a.shape[3]

        # 计算gram矩阵
        A = gram(a, M, N)
        G = gram(x, M, N)

        # 根据公式计算损失,并进行累加
        style_loss += (1.0 / (4 * M * M * N * N)) * tf.reduce_sum(tf.pow(G - A, 2)) * weight
    # 将损失对层数取平均
    style_loss /= len(style_layers)
    return style_loss

5 main code implementation

The code implementation is mainly divided into 4 steps:

  • 1. Randomly generate pictures
  • 2. Read content and style pictures
  • 3. Calculate the total loss
  • 4. Train and modify the parameters of the generated image to minimize the loss
def main():
    # 生成图片
    rand_img = tf.Variable(random_img(WIGHT, HEIGHT), dtype=tf.float32)
    with tf.Session() as sess:

        content_img = cv2.imread('content.jpg')
        style_img = cv2.imread('style.jpg')

        # 计算loss值
        cost = ALPHA * content_loss(content_img, rand_img) + BETA * style_loss(style_img, rand_img)
        optimizer = tf.train.AdamOptimizer(LEARNING_RATE).minimize(cost)

        sess.run(tf.global_variables_initializer())
        
        for step in range(TRAIN_STEPS):
            # 训练
            sess.run([optimizer,  rand_img])

            if step % 50 == 0:
                img = sess.run(rand_img)
                img = np.clip(img, 0, 255).astype(np.uint8)
                name = OUTPUT_IMAGE + "//" + str(step) + ".jpg"
                cv2.imwrite(name, img)

6 Migration Model Implementation

Since when solving the loss value, it is necessary to obtain the eigenvalues ​​at multiple network layers and carry out weighted summation according to the eigenvalues, it is necessary to re-establish the VGG network based on the existing VGG network and its parameters.
Note: The VGG-19 network is used here:

Before rebuilding, you should first download the VGG-19 network that Google has trained in order to extract the trained parameters and reuse them in the reconstructed VGG-19 network.

insert image description here
After downloading the .mat file, you can rebuild the network. It is known that the network structure of the VGG-19 network is like the E network in Figure 1 above, and the network can be reconstructed according to the structure of the E network. The VGG-19 network:

insert image description here
Reconstruction is to recreate a neural network with the same structure according to the structure of the VGG-19 model, extract the trained parameters as the parameters of the new network, and set them as constants that cannot be changed.

def vgg19():
    layers=(
        'conv1_1','relu1_1','conv1_2','relu1_2','pool1',
        'conv2_1','relu2_1','conv2_2','relu2_2','pool2',
        'conv3_1','relu3_1','conv3_2','relu3_2','conv3_3','relu3_3','conv3_4','relu3_4','pool3',
        'conv4_1','relu4_1','conv4_2','relu4_2','conv4_3','relu4_3','conv4_4','relu4_4','pool4',
        'conv5_1','relu5_1','conv5_2','relu5_2','conv5_3','relu5_3','conv5_4','relu5_4','pool5'
    )
    vgg = scipy.io.loadmat('D://python//imagenet-vgg-verydeep-19.mat')
    weights = vgg['layers'][0]

    network={
    
    }
    net = tf.Variable(np.zeros([1, 300, 450, 3]), dtype=tf.float32)
    network['input'] = net
    for i,name in enumerate(layers):
        layer_type=name[:4]
        if layer_type=='conv':
            kernels = weights[i][0][0][0][0][0]
            bias = weights[i][0][0][0][0][1]
            conv=tf.nn.conv2d(net,tf.constant(kernels),strides=(1,1,1,1),padding='SAME',name=name)
            net=tf.nn.relu(conv + bias)
        elif layer_type=='pool':
            net=tf.nn.max_pool(net,ksize=(1,2,2,1),strides=(1,2,2,1),padding='SAME')
        network[name]=net
    return network

Since the data will not change when calculating style features and content features, in order to save training time, the feature results are calculated before training (this function is encapsulated in the following code get_neck() function).

The overall code is as follows:

import tensorflow as tf
import numpy as np
import scipy.io
import cv2
import scipy.misc

HEIGHT = 300
WIGHT = 450
LEARNING_RATE = 1.0
NOISE = 0.5
ALPHA = 1
BETA = 500

TRAIN_STEPS = 200

OUTPUT_IMAGE = "D://python//img"
STYLE_LAUERS = [('conv1_1', 0.2), ('conv2_1', 0.2), ('conv3_1', 0.2), ('conv4_1', 0.2), ('conv5_1', 0.2)]
CONTENT_LAYERS = [('conv4_2', 0.5), ('conv5_2',0.5)]


def vgg19():
    layers=(
        'conv1_1','relu1_1','conv1_2','relu1_2','pool1',
        'conv2_1','relu2_1','conv2_2','relu2_2','pool2',
        'conv3_1','relu3_1','conv3_2','relu3_2','conv3_3','relu3_3','conv3_4','relu3_4','pool3',
        'conv4_1','relu4_1','conv4_2','relu4_2','conv4_3','relu4_3','conv4_4','relu4_4','pool4',
        'conv5_1','relu5_1','conv5_2','relu5_2','conv5_3','relu5_3','conv5_4','relu5_4','pool5'
    )
    vgg = scipy.io.loadmat('D://python//imagenet-vgg-verydeep-19.mat')
    weights = vgg['layers'][0]

    network={
    
    }
    net = tf.Variable(np.zeros([1, 300, 450, 3]), dtype=tf.float32)
    network['input'] = net
    for i,name in enumerate(layers):
        layer_type=name[:4]
        if layer_type=='conv':
            kernels = weights[i][0][0][0][0][0]
            bias = weights[i][0][0][0][0][1]
            conv=tf.nn.conv2d(net,tf.constant(kernels),strides=(1,1,1,1),padding='SAME',name=name)
            net=tf.nn.relu(conv + bias)
        elif layer_type=='pool':
            net=tf.nn.max_pool(net,ksize=(1,2,2,1),strides=(1,2,2,1),padding='SAME')
        network[name]=net
    return network


# 求gamm矩阵
def gram(x, size, deep):
    x = tf.reshape(x, (size, deep))
    g = tf.matmul(tf.transpose(x), x)
    return g


def style_loss(sess, style_neck, model):
    style_loss = 0.0
    for layer_name, weight in STYLE_LAUERS:
        # 计算特征矩阵
        a = style_neck[layer_name]
        x = model[layer_name]
        # 长x宽
        M = a.shape[1] * a.shape[2]
        N = a.shape[3]

        # 计算gram矩阵
        A = gram(a, M, N)
        G = gram(x, M, N)

        # 根据公式计算损失,并进行累加
        style_loss += (1.0 / (4 * M * M * N * N)) * tf.reduce_sum(tf.pow(G - A, 2)) * weight
        # 将损失对层数取平均
    style_loss /= len(STYLE_LAUERS)
    return style_loss


def content_loss(sess, content_neck, model):
    content_loss = 0.0
    # 逐个取出衡量内容损失的vgg层名称及对应权重

    for layer_name, weight in CONTENT_LAYERS:
        # 计算特征矩阵
        p = content_neck[layer_name]
        x = model[layer_name]
        # 长x宽xchannel

        M = p.shape[1] * p.shape[2]
        N = p.shape[3]

        lss = 1.0 / (M * N)
        content_loss += lss * tf.reduce_sum(tf.pow(p - x, 2)) * weight
        # 根据公式计算损失,并进行累加

    # 将损失对层数取平均
    content_loss /= len(CONTENT_LAYERS)
    return content_loss


def random_img(height, weight, content_img):
    noise_image = np.random.uniform(-20, 20, [1, height, weight, 3])
    random_img = noise_image * NOISE + content_img * (1 - NOISE)
    return random_img


def get_neck(sess, model, content_img, style_img):
    sess.run(tf.assign(model['input'], content_img))
    content_neck = {
    
    }
    for layer_name, weight in CONTENT_LAYERS:
        # 计算特征矩阵
        p = sess.run(model[layer_name])
        content_neck[layer_name] = p
    sess.run(tf.assign(model['input'], style_img))
    style_content = {
    
    }
    for layer_name, weight in STYLE_LAUERS:
        # 计算特征矩阵
        a = sess.run(model[layer_name])
        style_content[layer_name] = a
    return content_neck, style_content


def main():
    model = vgg19()
    content_img = cv2.imread('D://a//content1.jpg')
    content_img = cv2.resize(content_img, (450, 300))
    content_img = np.reshape(content_img, (1, 300, 450, 3)) - [128.0, 128.2, 128.0]
    style_img = cv2.imread('D://a//style1.jpg')
    style_img = cv2.resize(style_img, (450, 300))
    style_img = np.reshape(style_img, (1, 300, 450, 3)) - [128.0, 128.2, 128.0]

    # 生成图片
    rand_img = random_img(HEIGHT, WIGHT, content_img)

    with tf.Session() as sess:
        # 计算loss值
        content_neck, style_neck = get_neck(sess, model, content_img, style_img)
        cost = ALPHA * content_loss(sess, content_neck, model) + BETA * style_loss(sess, style_neck, model)
        optimizer = tf.train.AdamOptimizer(LEARNING_RATE).minimize(cost)

        sess.run(tf.global_variables_initializer())
        sess.run(tf.assign(model['input'], rand_img))
        for step in range(TRAIN_STEPS):
            print(step)
            # 训练
            sess.run(optimizer)

            if step % 10 == 0:
                img = sess.run(model['input'])
                img += [128, 128, 128]
                img = np.clip(img, 0, 255).astype(np.uint8)
                name = OUTPUT_IMAGE + "//" + str(step) + ".jpg"
                img = img[0]
                cv2.imwrite(name, img)

        img = sess.run(model['input'])
        img += [128, 128, 128]
        img = np.clip(img, 0, 255).astype(np.uint8)
        cv2.imwrite("D://end.jpg", img[0])

main()

7 Effect display

insert image description here

8 last

Guess you like

Origin blog.csdn.net/HUXINY/article/details/129459909