In-depth analysis of PyTorch convolutional neural network examples-taking the MNIST data set as an example

PyTorch builds general deep learning network steps

  1. Load the data set;
  2. Define the network structure model;
  3. Define the loss function;
  4. Define the optimization algorithm;
  5. Iterative training
  6. Test set verification.
    Among them, the training phase is mainly divided into four parts. 1: Forward process, calculating the result of input to output. 2: Calculate the value of the loss function from the result and labels. 3: In the backward process, the gradient of each variable is calculated from the loss. 4: The optimizer updates the parameters according to the gradient.
    The following is a specific case analysis:

1. Configuration library

#配置库
import torch
from torch import nn,optim
from torch.autograd import Variable
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision import datasets
  1. optim is the optimizer module, which includes specific optimization algorithms: SGD, Momentum, RMSProp, AdaGrad and Adam. Among them, Momentum is an accelerated gradient descent algorithm, and the other three are to improve the learning rate. Commonly used are: SGD and Adam.
  2. Variable is an encapsulation of tensor, which is used to put into the calculation graph for forward propagation, back propagation and automatic derivation. It is a very important basic object. Contains three important attributes: data, grad, creator. Among them, data represents the Tensor itself, grad represents the gradient of the propagation direction, and creator is the Function reference that creates this Variable, which is used to backtrack the entire creation link. If it is a User-created Variable, the creator is None, and this Variable is called Leaf Variable, and autograd will only assign gradients to this Variable.
  3. DataLoader is Dataset is a packaging class, used to package data into Dataset class, and then passed to DataLoader, we then use DataLoader this class to operate on the data more quickly.
    Specifically, the data set is packaged into a specific format for easy processing and reference.
  4. torchvision.transforms is an image preprocessing module in pytorch, which contains many functions for transforming image data, which are all necessary in the image data reading step.
  5. As the name suggests, datasets are a series of datasets, and datasets such as MNIST can be loaded through corresponding commands.

2. Configure hyperparameters

#配置参数
torch.manual_seed(1)
batch_size = 128    #批处理大小
learning_rate = 1e-2
num_epocher = 10    #训练次数
  1. torch.manual_seed() is to set the random number seed to ensure the repeatability of the program number and facilitate testing.
  2. batch_size is the batch size (batch size), that is, the amount of data processed each time. Number of iterations = total number of samples/batch size; within a certain range, the larger the batch size, the fewer iterations required to run a full data set (one epoch), and the faster the data processing speed. But blind increase will lead to longer time needed to reach a certain accuracy.
  3. learning_rate is the learning rate. The smaller the learning rate, the smaller the step of each gradient descent, and more training times are required to achieve the target accuracy; but too large a learning rate will cause the gradient descent to not converge and fail to achieve the purpose of learning.
  4. num_epocher is the number of times the data set is trained, that is, the data set has been run several times.

3. Load the data set

train_dataset = datasets.MNIST(	#训练集
    root='./data',
    train = True,
    transform=transforms.ToTensor(),
    download=False)

test_dataset = datasets.MNIST(
    root='./data',
    train=False,    #测试集
    transform=transforms.ToTensor())
train_loader = DataLoader(train_dataset,batch_size=batch_size,shuffle=True)
test_loader = DataLoader(test_dataset,batch_size=batch_size,shuffle=False)
  1. Load the MNIST handwritten digit data set in datasets, you can load the locally downloaded data set. Root is the storage location of the data set; train indicates whether it is a training set; transform.ToTensor() is to normalize the data of the data set, and convert the Image with the value [0,255] into the Tensor data of [0,1.0], and normalize Modification can improve the convergence speed of the gradient descent algorithm; downlaod represents whether the data set needs to be downloaded online, if not locally, it needs to be True.
  2. Load the test set, and the parameter settings are similar to the training set.
  3. As mentioned in the previous section, DataLoader is used to pack data and provide quick processing of data. The shuffle parameter is whether to shuffle the order. The training set must shuffle the order of the data to prevent overfitting.

4. Define the convolutional network structure model (focus)

#定义卷积神经网络模型
class Cnn(nn.Module):
    def __init__(self,in_dim,n_class):
        super(Cnn,self).__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_dim,6,3,stride=1,padding=1),   #28x28
            nn.ReLU(True),
            nn.MaxPool2d(2,2),  #14x14,池化层减小尺寸
            nn.Conv2d(6,16,5,stride=1,padding=0),   #10x10x16
            nn.ReLU(True),
            nn.MaxPool2d(2,2)   #5x5x16
         )
        self.fc = nn.Sequential(
            nn.Linear(400,120), #400=5*5*16
            nn.Linear(120,84),
            nn.Linear(84,n_class)
        )
    def forward(self,x):
        out = self.conv(x)
        out = out.view(out.size(0),400)
        out = self.fc(out)
        return out

model = Cnn(1,10)   #图片大小28*28,10为数据种类
#打印模型
print(model)
  1. nn.Moudle is a very important class, including the definition of each layer of the network, and the definition of the forword function. All custom network structure models need to inherit the Moudle class.
"""
自定义网络结构:
    需要继承nn.Module类,并实现forward方法。
    一般把网络中具有可学习参数的层放在构造函数__init__()中,
    不具有可学习参数的层(如ReLU)可放在构造函数中,也可不放在构造函数中(而在forward中使用nn.functional来代替)
    
    只要在nn.Module的子类中定义了forward函数,backward函数就会被自动实现,而不需要像forword那样需要重新定义。
"""
  1. The role of the super class is to call the __init__ function of each base class containing super when inheriting. If super is not used, the __init__ function of these classes will not be called unless explicitly declared. And using super can prevent the base class from being called repeatedly.
  2. The official interpretation of nn.sequential is an ordered container. The neural network module will be added to the calculation graph for execution in the order of the incoming constructor. At the same time, an ordered dictionary with neural network modules as elements can also be used as incoming parameter. The role here is to construct a neural network structure.
  3. nn.Conv2d is the convolutional layer function of the convolutional neural network. Its default parameters are as follows:
    nn.Conv2d(self, in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True ))
    Parameter explanation in the program:
      in_channel: input data channel number, RGB picture channel number is 3, black and white picture channel number 1, MNIST data set is black and white image, so the parameter is 1;
      out_channel: output data channel number, also called Is the number of convolution kernels, which is related to the number of extracted features, here is 6;
      kennel_size: the size of the convolution kernel; here is the size of the convolution kernel is 3*3;
      stride: the step size, the default is 1;
      padding: the number of padding, is Keep the output size unchanged, and take 1 here.
    The formula for calculating the output size of the convolutional layer is:

Therefore, after the input of 1×28×28 passes through the convolutional layer, the output is 6×28×28 (6 is the number of output channels, that is, the depth). The calculation of the output value of the convolution operation is not detailed here, as shown in the following figure:

  1. nn.Relu is the activation function, which has the effect of reducing the amount of calculation and alleviating over-fitting. The analytical formula is as follows:

  2. nn.MaxPool2d is the pooling layer of the convolutional neural network, which has the function of downsampling and reducing the amount of calculation. The maximum pooling layer is used here, the size of the convolution kernel is 2*2, and the step size is 2. The calculation method of the output size of the pooling layer is:
    W: image width, H: image height, D: image depth (number of channels)
    F: width and height of the convolution kernel, S: step size
    . Output image size after pooling:
    W=(WF )/S+1
    H=(HF)/S+1
    The number of image channels (depth) output remains unchanged.
    Therefore, after the pooling layer in the program, the picture size is 6×14×14, which shows that the amount of calculation is reduced.
    The calculation of the output values ​​of the two pooling operations is as shown in the figure below. The maximum pooling is to retain the largest feature in the window.

  3. After two convolution and pooling operations, the final output size of the picture is a three-dimensional array of 16×5×5. Next is the fully connected layer (nn.liner). Previously it was the feature data of the three-dimensional structure, and the Liner layer integrated it.
    Each node of the fully connected layer is connected to all the nodes of the previous layer, and is used to integrate the features extracted from the front. Due to its fully connected characteristics, generally the parameters of the fully connected layer are also the most.
    In the program, since a total of 16×5×5 nodes were passed in the front, the input is 400; because handwritten numbers need to be divided into 10 categories, the final output node number is 10.

  4. The forward function is used for forward propagation, which forwards the data stream. After the forward function is defined, the backpropagation function backward() will be automatically defined, instead of being explicitly defined like forward propagation, it can be called directly.

  5. model = Cnn(1,10) Pass the parameters into the model and name it.

5. Model training

#模型训练
#定义loss和optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(),lr=learning_rate)
#开始训练
for epoch in range(num_epocher):
    running_loss = 0.0
    running_acc = 0.0
    for i,data in enumerate(train_loader,1):
        img,label = data
        img = Variable(img)
        label = Variable(label)
        #前向传播
        out = model(img)
        #print(out)
        loss = criterion(out,label) #loss
        running_loss += loss.item() * label.size(0)
        #total loss ,由于loss是取batch均值的,因此,需要把batch size成回去
        _,pred = torch.max(out,1)   #预测结果
        num_correct = (pred == label).sum() #正确结果的数量
        #accuracy = (pred == label).float().mean()   #正确率
        running_acc += num_correct.item()   #正确结果的总数
        #后向传播
        optimizer.zero_grad()   #梯度清零
        loss.backward() #后向传播计算梯度
        optimizer.step()    #利用梯度更新W,b参数
    #打印一个循环后,训练集合上的loss和正确率
    print('Train{} epoch, Loss: {:.6f},Acc: {:.6f}'.format(epoch+1,running_loss / (len(train_dataset)),running_acc / (len(train_dataset))))
  1. nn.CrossEntropyLoss is the cross entropy loss function, used to calculate the loss.
  2. The SGD function is an optimizer. Its main function is to optimize our neural network to make it faster in our training process and save time. model.parameters() is used to obtain the parameters of our neural network.
  3. enumerate() is a built-in function of python, which is used to combine a traversable data object (such as a list, tuple or string, etc.) into an index sequence and list data and data subscripts at the same time.
    enumerate(a,start)
    a is an iterable object, start is the starting number of counting, the program is to traverse the training set, starting from 1.
  4. Variable, mentioned in the first section, is the encapsulation of tensor, which is used to put into the calculation graph for forward propagation, back propagation and automatic derivation. Here, img and label are encapsulated.
  5. out = model(img) is to pass data into the model for forward propagation.
  6. loss = criterion(out, label) Here is to compare the calculation result with the label to calculate the loss.
  7. running_loss += loss.item() * label.size(0) Here, the loss is accumulated and left as the average to calculate the loss of the entire epoch. In the case of multiple dimensions, loss is the mean value of multiple dimensions, so it needs to be multiplied by the number of dimensions first.
  8. torch.max(a,1) returns the element with the largest value in each row, and returns its index (returns the column index of the largest element in this row), where pred is the returned prediction result.
  9. num_correct = (pred == label).sum(), find the number of correct results (same as predicted results).
  10. running_acc += num_correct.item() Find the total number of correct results.
  11. optimizer.zero_grad(), clear the accumulated gradient in the front to facilitate the back propagation in the back.
  12. loss.backward(), back propagation.
  13. optimizer.step() uses backpropagation to update the internal weights of the network model, thereby reducing the loss of the model.
  14. Finally, after printing an epoch, the loss and accuracy of the training set

6. The recognition rate of the test model in the test set

#模型测试
model.eval()    #需要说明是否模型测试
eval_loss = 0
eval_acc = 0
for data in test_loader:
    img,label = data
    img = Variable(img,volatile=True)   #改用with torch.no_grad()但还不会:这里的volatile在新的版本被移除了
    #volatile用于测试是否不调用backward
    #测试中不需要label= Variable...
    out = model(img)    #前向算法
    loss = criterion(out,label) #计算loss
    eval_loss += loss.item() * label.size(0)    #total loss
    _,pred = torch.max(out,1)   #预测结果
    num_correct = (pred == label).sum() #正确结果
    eval_acc += num_correct.item()  #正确结果总数

print('Test Loss:{:.6f},Acc: {:.6f}'
      .format(eval_loss/ (len(test_dataset)),eval_acc * 1.0/(len(test_dataset))))
  1. model.eval(), turn the model into a test mode. The model model.eval() is used to fix the BN and dropout layers so that the bias parameters do not change. However, these two operations are not used in this program.
  2. The following part of the procedure is similar to the training set, so I will not repeat it.

7. Run results

Cnn(
(conv): Sequential(
(0): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(3): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(4): ReLU(inplace=True)
(5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(fc): Sequential(
(0): Linear(in_features=400, out_features=120, bias=True)
(1): Linear(in_features=120, out_features=84, bias=True)
(2): Linear(in_features=84, out_features=10, bias=True)
)
)
Train1 epoch, Loss: 2.285776,Acc: 0.221550
Train2 epoch, Loss: 1.370810,Acc: 0.636100
Train3 epoch, Loss: 0.411640, Acc: 0.878833
Train4 epoch, Loss: 0.294587, Acc: 0.912050
Train5 epoch, Loss: 0.231720, Acc: 0.930100
Train6 epoch, Loss: 0.188466, Acc: 0.942800
Train7 epoch, Loss: 0.158935, Acc: 0.952733
Train8 epoch, Loss: 0.139243, Acc: 0.958150
Train9 epoch, Loss: 0.125945, Acc: 0.961917
Train10 epoch, Loss: 0.115717, Acc: 0.965000
E:/Pycharm/project/project_pytorch/.idea/Conv_complete.py:94: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad():instead.
img = Variable(img,volatile=True) #Change to with torch.no_grad() but not yet: volatile here has been removed in the new version
Test Loss: 0.101987,Acc: 0.967800

Guess you like

Origin blog.csdn.net/weixin_45371989/article/details/103922630