Detailed analysis of CNN based on Pytorch through examples

I. Introduction

1) Disclaimer

According to international practice, the first is a disclaimer: This article is just my own understanding of learning convolutional neural networks. There are many inaccuracies and even mistakes in the content. I hope to criticize and correct and make progress together.

2) Write some of my own understanding

My attitude towards learning has always been "before learning to walk, you must first learn to run, and then learn to walk". For CNN and deep learning theory, you can find a lot of learning materials and training courses on the Internet, but these are often echoed by others. I don't know why I know what it is, and I talked about a lot of complicated theories when I first came up, but I didn't grasp the essence of CNN.
Convolutional neural network, or any kind of neuron network model, is essentially a linear combination of multiple nonlinear operations to obtain the desired objective function, and a correct (close to correct) output can be obtained after a given input. Its nature is not complicated.
The reason why many deep learning network models we see are very complicated is because the real problems to be solved are very complicated, just like a palace is very complicated, but its essence is stacked with simple bricks.
The main purpose of this article is to use a simplified example to deeply learn convolutional neuron networks (CNN)

3) What you need to know before reading this article

This article will not repeat any details about the convolutional neural network theory. You only need to understand the most basic convolutional layer operations and have a basic Pytorch programming foundation to understand this article.

2. Examples

1) What problem to solve

We want to build a convolutional neural network like this: For a given 3×3 matrix, if all the elements in the matrix are the same, such as
[6, 6, 6]
[6, 6, 6]
[6, 6, 6 ]
then output "1" (True). If all elements are different (random number), output "0" (False).

2) CNN network model training data

The training set data contains 100 true samples (Train_set_true) from -50 to 49:
[-50, -50, -50] [-49, -49, -49] [-48, -48, -48] …[ 0, 0, 0] ... [49, 49, 49]
[-50, -50, -50] [-49, -49, -49] [-48, -48, -48] ... [0, 0, 0] ...[49, 49, 49]
[-50, -50, -50] [-49, -49, -49] [-48, -48, -48] ... [0, 0, 0] ... [ 49, 49, 49]
There is also a fake sample (Train_set_false) using torch.rand() to randomly generate 100 3×3 matrices.

3) CNN network model

  • Model structure: convolutional layer (stride is 1, kernel is 2×2) + LeakyReLu+ convolutional layer (stride is 1, kernel is 2×2) + Sigmoid. Since the problem is not complicated, it can be realized only by building a very simple model structure;
  • Loss function: The example in this article is a typical binary classification problem (judging whether or not), using the BCE (Binary CrossEntropy) loss function. There are two loss functions in this example: the loss function (loss_true) judged as true and the loss judged as false function(loss_false);
  • Optimization function: Adadelta (I recommend an article about the introduction of different optimization functions: Introduction to optimization functions , why choose Adadelta here, in fact, the estimated results of other optimization functions are similar, because this example is really too simple)

Of course, there is no such simple CNN application in reality, but this example can be regarded as an abstraction of real problems. The 200 input training set matrices can be regarded as 200 real-life photos, and the 100 matrices with all elements equal can be regarded as photos with "pedestrians" in the photos, and the randomly generated 100 matrices can be viewed as Zuo is no "pedestrian" photos. All we have to do is build a convolutional neural network model that can determine the probability of a "pedestrian" in a photo.

3. CNN network model code based on Pytorch

attached at the end

Fourth, in-depth analysis

1) How to determine the learning rate and epoch?

This article has already explained at the beginning that the CNN network model is actually closer to engineering applications. There is no theoretical method for the determination of learning rate and epoch. I think it's just a trial. . .
The following is the result of learning rate=0.5, epoch=200 (blue is the loss function judged as true (loss_true), red is the loss function judged as false (loss_false))
insert image description here

2) See what the weights look like

Use the .state_dict() function to print the weights every 20 epochs, and you can see the weight and bias of each convolutional layer. (To save space here, only the last weight parameter is listed)


OrderedDict([('model.0.weight', tensor([[[[-0.4180,  0.8206],
          [ 0.5418, -0.9463]]]])), ('model.0.bias', tensor([0.0090])), ('model.2.weight', tensor([[[[-1.0951, -0.4438],
          [-0.7047, -0.8174]]]])), ('model.2.bias', tensor([6.7611]))])

For a practical network model, the weight file will be very large, and there is nothing to "see". But for this simple model, you can print out the weight values, and you can even calculate them by hand.
It can be seen from the weight of this example that the main function is the first layer of convolution, and the sum of the 4 elements of the kernel is about 0. If the elements of the original data (convolved matrix) are similar, then the first layer of convolution The output of the product is almost [0], and the second layer of convolution is added with a relatively large bias, and it is about 1 after sigmoid. If the deviation of the original data elements is large (randomly generated), then the first layer of convolution will generate a positive value with high probability (because the first convolution layer bias is positive), and the kernel elements of the second layer of convolution are all absolute values For a large negative value, the output after the convolution operation is also a large negative value, so it will be basically 0 after Sigmoid.

3) Understanding loss.backward()

Backpropagation is actually the partial derivative of a function to a variable, which can be understood by making a simple example.

import torch

a = torch.tensor([1],dtype=float,requires_grad=True)
b = torch.tensor([4],dtype=float,requires_grad=True)
c = a**3 + b
c.backward()

print(a.grad)
print(b.grad)
------------------------------输出------------------------------------
tensor([3.], dtype=torch.float64)
tensor([1.], dtype=torch.float64)

The partial derivative of the loss function is to give the next step .step() operation (let the weight approach the minimum loss result)

4) Understand optimize.step()

Use the partial derivative value obtained in the previous step to update the weight value. The update method is related to the type of optimization function (SGD, Momentum, Adams...), taking stochastic gradient descent (SGD) as an example:

  • If the obtained partial derivative is negative, add a step size. If the obtained partial derivative is positive, reduce the step size;
  • Step size = learning rate × absolute value of partial derivative

High school mathematics knowledge

5) Why use zero_grad()?

This is a very confusing operation for novices, but it is also easy to explain. As mentioned above, .step() is to use "the partial derivative value obtained in the previous step to update the weight value", so what if there are multiple partial derivative values ​​( .backward() ) in front of .step() manage? take the latest one?
In fact, .step() will take all the above partial derivatives and sum them up, and then perform the .step() operation, which is obviously not what we want. So we need to use zero_grad() to clear other partial derivative values ​​except the partial derivative value we want (it’s a bit convoluted, which means to remove the value obtained by the previous reverse transfer, and only update the weight according to this partial derivative value)

You can try to comment out zero_grad() to see if the loss will converge.

Finally attach the source code

import torch
import matplotlib.pyplot as plt
import numpy
import codecs

class CNN(torch.nn.Module):

    def __init__(self):
        super(CNN,self).__init__()
        self.model = torch.nn.Sequential(

            torch.nn.Conv2d(in_channels=1, out_channels=1, kernel_size=2),
            torch.nn.LeakyReLU(),

            torch.nn.Conv2d(in_channels=1, out_channels=1, kernel_size=2),
            torch.nn.Sigmoid()   #最后一个要用sigmoid,否则loss的输入可能不在0~1之间
        )

    def forward(self, x):
        return self.model(x)

cnn = CNN()



train_base = numpy.ones([1,1,3,3])
train_set_true = torch.tensor([i*train_base for i in range(-50,50,1)])  #生成(batch=100,in_channel=1,out_channel=1,3,3)的训练数据集,这些都是真样本,符合规律
train_set_false = torch.rand(100,1,1,3,3)*100-50  #生成(batch=100,in_channel=1,out_channel=1,3,3)的训练数据集,这些都是假样本,都是噪声点

train_set_true = train_set_true.to(torch.float64)
train_set_false = train_set_false.to(torch.float64)

train_target_true = torch.ones(1,1,1,1)  #生成训练目标,真目标为1
train_target_false = torch.zeros(1,1,1,1)  #生成训练目标,假目标为0

cnn_loss = torch.nn.BCELoss()   #定义损失函数为二分交叉熵

cnn_opt = torch.optim.Adadelta(cnn.parameters(),lr=0.5)   #定义cnn权重的优化函数为adadelta

epoch = 200

for i in range(epoch):

    for iteration,train_sample_true in enumerate(train_set_true):

        cnn_opt.zero_grad()   #训练的过程通常使用mini-batch方法,所以如果不将梯度清零的话,梯度会与上一个batch的数据相关,因此该函数要写在反向传播和梯度下降之前。
        cnn_output_true = cnn(train_sample_true.to(torch.float32))
        loss_true = cnn_loss(cnn_output_true, train_target_true)

        loss_true.backward()
        cnn_opt.step() #optimizer.step()函数的作用是执行一次优化步骤,通过梯度下降法来更新参数的值。因为梯度下降是基于梯度的,所以在执行optimizer.step()函数前应先执行loss.backward()函数来计算梯度。

    for iteration, train_sample_false in enumerate(train_set_false):

        cnn_opt.zero_grad()
        cnn_output_false = cnn(train_sample_false.to(torch.float32))
        loss_false = cnn_loss(cnn_output_false, train_target_false)

        loss_false.backward()
        cnn_opt.step()


    if i%5 == 0 :
        loss_true_numpy = loss_true.detach().numpy()
        loss_false_numpy = loss_false.detach().numpy()
        plt.scatter(i, loss_true_numpy, c='b')
        plt.scatter(i, loss_false_numpy, c='r')

    if i%20 == 0:
        print(cnn.state_dict())

plt.show()



if __name__ == '__main__':
    print(cnn(torch.tensor([[[[200,200,200],[200,200,200],[200,200,200]]]]).to(torch.float32)))
    print(cnn(torch.tensor([[[[11.11,11.11,11.11],[11.11,11.11,11.11],[11.11,11.11,11.11]]]]).to(torch.float32)))


    print(cnn(torch.tensor([[[[10,-43,7],[49,50,-51],[39,-59,71]]]]).to(torch.float32)))
    print(cnn(torch.tensor([[[[5.6,9.8,100],[12,65,6],[0.7,43,4]]]]).to(torch.float32)))

Guess you like

Origin blog.csdn.net/m0_49963403/article/details/126678362