[PyTorch] Convolutional Neural Network

Convolutional Neural Network

Convolutional neural networks were originally designed to solve computer vision-related problems. Now they are not only used in image and video fields, but also in the processing of time series signals such as audio signals.

This article mainly focuses on the basic principles of convolutional neural networks and the use of PyTorch to implement convolutional neural networks.

1. Development context

Insert picture description here

2. Convolutional Neural Network

Because the convolutional neural network was also proposed to solve the image problem at the beginning, we often use the image problem as an example when explaining its concept.

(1) Summary

1. Problems with fully connected networks

  • Fully connected neural network: a neural network that only contains a fully connected layer ( all nodes in every two layers are connected in pairs )

① Huge amount of parameters

For example, for a 200x200 image is input, the next number of neurons in the hidden layer is set to 10 . 4 months, the whole structure of the connection layer to the next a total of 200x200x10 . 4 parameter weightings.
ps Here we assume: for image input, pixel-level processing is used, that is, each input layer unit processes a pixel block.

②Training is time-consuming

Because of the huge number of parameters, training efficiency is low when performing backpropagation.

③Overfitting

It is also because of the huge number of parameters that the number of parameters is too large compared with the labeled data, which can easily lead to the problem of overfitting in the degree of model training.

2. Small trick of convolutional neural network

①Local connection

Each convolution operation is only responsible for processing a small piece of image, and transfer the result to the following grid.

The same is for a 200x200 input image. Under local connection, each unit in the hidden layer is only connected to the 4x4 local image in the image. At this time, the number of parameters is only 4x4x10 4 = 1.6x10 5 , which is reduced compared to the fully connected layer. 3 orders of magnitude.
Insert picture description here

② Weight sharing

  • As discussed in our local connection , if there are n neurons in the hidden layer , and each neuron is only connected to the local mxm image of the previous layer , then we will have n·m 2 weight parameters ; Under the idea of weight sharing , we let each neuron have the same mxm parameters , so no matter what the number of hidden units n is, we will only have m 2 weight parameters in this layer .

[Essence] The
convolution layer is used to undertake the result of the convolution operation. The core of the convolution operation is the convolution kernel. The function of the convolution kernel corresponds to the original image after a certain filtering to obtain a new image.

Every pixel in the new image is obtained by this common convolution kernel. Each unit of the convolutional layer is used to store the pixels of each new image, so the weight parameters associated with each unit should be the same.

ps For specific knowledge about convolution, please refer to the blog post "Wu Enda Deep Learning cnn"
[Convolution Kernels and Features]
A convolution layer can have multiple different convolution kernels, and each convolution layer is equivalent to extracting A feature of the original image comes out;
in practical applications, we may need to extract multiple features, so it can be achieved by adding a convolution kernel

3. General structure of convolutional neural network

  • A general convolutional network consists of a convolutional layer, a pooling layer, a fully connected layer, and a Softmax layer.

[Convolutional layer] The input of each node in this layer is a small piece of the neural network of the previous layer. It views each small piece of the neural network for more in-depth analysis, thereby obtaining a higher degree of abstraction feature.

[Pooling layer] The network of this layer will not change the depth of the three-dimensional matrix (for example, the length and width of an RGB image reflects the size of the image, and the depth is 3 channels), but it will reduce the size of the matrix. In essence, this layer converts higher-resolution pictures into lower-resolution pictures.

[Fully connected layer] After multiple rounds of convolution and pooling, the convolutional neural network will connect the 1 to 2 fully connected layers to deliver the final result.

[Softmax layer] is used for classification problems, that is, the corresponding activation function and objective function are selected.

Local connection, weight sharing, and downsampling of the pooling layer reduce the amount of parameters, reduce the complexity of training, and reduce the risk of overfitting; at the same time, the convolutional neural network is given a certain degree of resistance to translation, deformation, and scale. Denaturation improves the generalization ability of the model.

(2) Convolutional layer

1. Basic knowledge

The most important part of the convolutional layer neural network structure is the convolution kernel (kernel) or also known as the filter (filter). The convolution kernel transforms a sub-node matrix on the current layer of neural network into the next layer of neural network. Of a node matrix.
Insert picture description here

According to the above figure, when we use convolutional neural networks and convolution operations and structures, the most important thing is to understand the relevant parameters of the convolution kernel and the relevant settings of neurons.

  • The size of the convolution kernel (the length and width of the convolution kernel) is manually specified, and the size of the child node matrix of the current layer of neural network is the size of the convolution kernel.
  • The (processing) depth of the convolution kernel is consistent with the depth of the neural network node matrix of the current layer.

It should be noted that the depth of the convolution kernel is often the same by default, even if the incoming current layer matrix is ​​three-dimensional, we still only need to manually specify the length and width two parameters.

Generally speaking, we will take the size of the convolution kernel as 3x3 and 5x5.
In the above figure, the matrix corresponding to the input image is shown on the left, and its size is 3x32x32.
Therefore, if we take the size of the convolution kernel as 5x5, then The size of each convolution kernel should actually be 3x5x5, and each neuron in the convolution layer (the small ellipse in the right area in the figure above) will have the weight of the 3x5x5 area in the input data, a total of 75 weights.

ps should pay attention to the description order of the data size in PyTorch.

  • The number of convolution kernels, that is, the output depth of the convolution layer, as shown in the right area of ​​the above figure, there are a total of 5 neurons, corresponding to 5 convolution kernels. The number of convolution kernels and the number of filters used are the same.
  • Step length, sliding step length is the number of pixels moved each time during the convolution operation.

The operation of sliding convolution makes the output data generally less.

  • Boundary padding has different padding modes as required; when all padding is 0, ensure that the input data has the same size; if the padding value is greater than 0, it can ensure that no boundary information is lost during the convolution operation.

[Convolution calculation formula]
The output size obtained after the convolution operation is calculated by the following formula to obtain
W ′ = floor ((W − F + 2 P) / S + 1) W'= floor((W-F+ 2P)/S+1)W=floor((WF+2P)/S+1)

Among them, floor represents the integer removal operation, W represents the size of the input data, F represents the size of the convolution kernel in the convolution layer, S represents the step size, and P represents the number of zero paddings.

2. Calls
in PyTorch There is a specially packaged convolution kernel module in PyTorch. nn.Conv2d()
Its formal parameter structure is as follows

nn.Conv2d(in_channels,out_channels,kernel_size,stride = 1,padding = 0,dilation = 1,groups = 1,bias = True)
  • in_channels: The depth of the input data body, which is determined by the size of the incoming data
  • out_channels: The depth of the output data volume, usually determined by the number of cores selected
  • kernel_size: The size of the convolution kernel; when the square convolution kernel was used, only one number was passed in; when the non-square convolution kernel was passed in, a tuple was passed in
  • stride: Sliding step length, the default is 1
  • padding: The number of padding on the boundary 0
  • dilation: Enter the space interval of the data body
  • groups: The depth of the relationship between the input data body and the output data body
  • bias: Indicates the offset
'''
对PyTorch中的卷积核的调用示例
'''
#方形卷积核,等长的步长
m = nn.Conv2d(16,33,3,stride = 2)

#非方形卷积核,非登场的步长和边界填充
m = nn.Conv2d(16,33,(3,5),stride = (2,1),padding = (4,2))

#非方形卷积核,非登场的步长、边界填充和空间间隔
m = nn.Conv2d(16,33,(3,5),stride = (2,1),padding = (4,2),dilation = (3,1))

#进行卷积运算
input = autograd.Variable(torch.randn(20,16,50,100))
output = m(input)

(3) Pooling layer

Usually a pooling layer is inserted after the convolutional layer. This layer of neural network has the following functions:

  • Gradually reduce the space size of the network
  • Reduce the number of parameters in the network
  • Reduce the use of computing resources
  • Effectively control model overfitting

There are generally two calculation methods for the pooling layer, Max Pooling and Mean Pooling; the former uses the [Maximum value] calculation, and the latter uses the [average value] calculation. The following uses Max Pooling as an example to discuss.

1. Calculation process The
pooling layer only down-samples the length and width dimensions of the data, and does not change the depth of the model.

It takes the depth slice of the input data as input, and continuously slides the window. Under the calculation principle of Max Pooling, the maximum value in these windows is taken as the output result.
Insert picture description here
How effective is the pooling layer?

  • Image features have local invariance, that is to say, the reduced image obtained after downsampling still does not lose its features

Based on this, the convolution operation is performed after the picture is reduced (the so-called convolution operation is to use the design kernel to extract the characteristics of the picture), which can reduce the time of the convolution operation.

  • Using the commonly used pooling scheme (pooling size is 2x2, sliding step size is 2), the down-sampling of the image actually loses 75% of the original image, and the largest part of it is selected to keep it, and noise can also be removed.

2. Calling in PyTorch

Because there are two different pooling schemes, similarly, in PyTorch there are corresponding nn.MaxPool2dandnn.AvgPool2d

nn.MaxPool2d(kernel_size,stride = None,padding = 0,dilation = 1,
return_indices = False,ceil_mode = False)
  • For related parameters, please refer to the explanation in the convolutional layer
  • return_indices: Whether to return the subscript of the maximum value
  • ceil_model: Use squares instead of layer structure

Similarly, if the selected pooling size is square, you only need to pass in a number, otherwise you need to pass in a tuple.

3. Classical Convolutional Neural Network

1. LeNet

LeNet specifically refers to LeNet-5, which was proposed by Professor Yann LeCun in the paper "Gradient-based learning applied to document recognition" in 1988. It was the first convolutional neural network successfully applied to digital recognition problems.

The LeNet-5 model has a total of 7 layers (2 convolutional layers, 2 pooling layers, 2 fully connected layers and an output layer)
Insert picture description here

class LeNet(nn.Module):
    def __init__(self):
        super(LeNet,self).__init__()
        self.conv1 = nn.Conv2d(3,6,5)
        self.conv2 = nn.Conv2d(6,16,5)
        self.fc1 = nn.Linear(16*5*5,120)
        self.fc2 = nn.Linear(120,84)
        self.fc3 = nn.Linear(84,10)

    def forward(self,x):
        out = F.relu(self.conv1(x))
        out = F.max_pool2d(out,2)
        out = F.relu(self.con2(out))
        out = F.max_pool2d(out,2)
        out = out.view(out.size(0),-1)
        out = F.relu(self.fc1(out))
        out = F.relu(self.fc2(out))
        out = self.fc3(out)
        return out

2. AlexNet

Proposed by Hilton student Alex Krizhevsky in 2012; the structure successfully applied techniques such as Relu, Dropout, and LRN.

The structure diagram of the AlexNet model is shown below. Because of the constraints of the computing power at the time, two GPUs were used for parallel calculation, so the structure diagram looks slightly complicated.
Insert picture description here
The following shows the equivalent model structure diagram for a single GPU calculation

The entire AlexNet contains 5 convolutional layers, 3 pooling layers and 3 fully connected layers.
Among them, both the convolutional layer and the fully connected layer include the ReLU layer, and the dropout layer is also used in the fully connected layer.

Insert picture description here

class AlexNet(nn.Module):
    def __init__(self,num_classes):
        super(AlexNet,self).__init__()
        self.features = nn.Sequential(
          nn.Conv2d(3,96,kernel_size = 11,stride = 4,padding = 2),
          nn.ReLU(inplace = True),
          nn.MaxPool2d(kernel_size = 3,stride = 2),
          nn.Conv2d(64,256,kernel_size = 5,padding = 2),
          nn.ReLU(inplace = True),
          nn.MaxPool2d(kernel_size = 3,stride = 2),
          nn.Conv2d(192,384,kernel_size = 3,padding = 1),
          nn.ReLU(inplace = True),
          nn.Conv2d(384,256,kernel_size = 3,padding = 1),
          nn.ReLU(inplace = True),
          nn.Conv2d(256,256,kernel_size = 3,padding = 1),
          nn.ReLU(inplace = True),
          nn.MaxPool2d(kernel_size = 3,stride = 2),
        )
        self.classifier = nn.Sequential(
             nn.Dropout(),
             nn.Linear(256*6*6,4096),
             nn.ReLU(inplace = True),
             nn.Dropout(),
             nn.Linear(4096,4096),
             nn.ReLU(inplace = True),
             nn.Linear(4096,num_classes)
         )
         def forward(self,x):
             x = self.features(x)
             x = x.view(x.size(0),256*6*6)
             x = self.classifier(x)
             return x

Guess you like

Origin blog.csdn.net/kodoshinichi/article/details/109680213
Recommended