Analysis and experiment of 2d convolution operator in CNN

Analysis and experiment of 2d convolution operator in CNN

In order to correspond to the previous experiment of the graph convolution operator in GNN, this experiment mainly analyzes the 2d convolution in CNN.

In CNN, we extract the local features of the image through the calculation operation of convolution. Each layer will calculate some local features, and these local features will be summarized to the next layer, so that the features are passed on from one layer to another. It becomes larger, and finally the image is processed through these local features, which greatly improves the calculation efficiency and accuracy.

As a basic operator, 2d convolution is widely used in many models, such as LeNet, AlexNet, GoogleNet, and ResNet. Unlike GNN, the convolution operations of various CNN models are the same, that is (in pytorch) the torch.nn.Conv2dentire model is constructed based on functions (except that the convolution parameters are slightly different), and the specific model structure can be . Therefore, which model is used for experimentation is not so important.

The classic model structure can refer to: https://github.com/zergtant/pytorch-handbook/blob/master/chapter2/2.4-cnn.ipynb

The following experiment is based on the handwritten digit recognition experiment-LeNet-5 is used to train the MNIST dataset.

Insert picture description here

Research and analysis of torch.nn.Conv2d

torch.nn.Conv2dThe source code of torch/nn/modules/conv.pyis implemented in, and by viewing the source code, the core 2d convolution operation can be traced back to the F.conv2d()function, but the corresponding python code cannot be found, but torch/_C/_VariableFunctions.pyithe function declaration is made in the file because it comes from THNN library written in C++ (for speed).

The tracing path and ideas are as follows:

torch.nn.Conv2d-->torch/nn/modules/conv.py-->F.conv2d()-->torch/_C/_VariableFunctions.pyi-->C++编写的THNN库

In short, the conclusion is that in pytorch, the core code of 2d convolution is written in C++!

Since pytorch compiled and linked the libraries written in C++ in advance, and then encapsulated them, modifying these directly requires changing the entire pytorch framework, which is very difficult and huge in engineering.

Therefore, the following experiment chooses to rewrite the conv2dpython code based on the principle of 2d convolution , and adds a timing mechanism after dividing the phase.

  • Disadvantages: The running speed of the program is significantly slowed down.
    • F.conv2d()The total time to execute 5 epochs using the built-in : 61.9329s
    • _conv2dThe total time to execute 5 epochs after rewriting by myself and adding the timing mechanism: 880.9045s
    • The running speed is about 14 times slower.
  • Advantages: It is convenient for program modification and experimentation.

Rewrite the idea of ​​conv2d and the division of phases

First of all, we must figure out torch.nn.Conv2dwhat we are doing.

class torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)

Macroscopically, the dimensions of the input data are [batch_size, in_channel, h_in, w_in], and the output dimensions after passing through the 2d convolutional layer are [batch_size, out_channel, h_out, w_out]. It can be found that in addition to the first dimension batch_size, the following three dimensions (number of channels, height of the picture, and width of the picture) will all change.

  • The dimensions of the input data are[batch_size, in_channel, h_in, w_in ]
  • The dimension of the weight of the convolutional layer is[out_channel, in_channel // group(默认为1), kernel_size, kernel_size]
  • The output dimension after the 2d convolutional layer is[batch_size, out_channel, h_out, w_out]

Insert picture description here

So how does the 2d convolution operation of a unit at the microscopic level actually work?

  1. Slice input data [batch_size, in_channel, h_in, w_in ]->[batch_size, in_channel, kernel_size, kernel_size ]
  2. The data obtained by slicing and the weight of the convolutional layer are subjected to a multi-dimensional dot product operation: the broadcast operation (the one dimension of None) is performed first, and then the dot product is performed.
    [batch_size, None, in_channel, kernel_size, kernel_size ]* [None, out_channel, in_channel, kernel_size, kernel_size]-->[batch_size, out_channel, in_channel, kernel_size, kernel_size ]
  3. Sum the last three dimensions of the data obtained by the dot product [batch_size, out_channel, in_channel, kernel_size, kernel_size ]->[batch_size, out_channel]

The above is the process of a unit of 2d convolution operation, sliding all the pixels (cycle the above process), and finally the result after convolution can be obtained [batch_size, out_channel, h_out, w_out].

For example, during the 2d convolution operation of a unit in the experiment, the dimensionality of the input data changes like this:

[64,1,28,28]-->切片-->[64,1,5,5]-->点积[64,1,1,5,5]*[1,6,1,5,5]=[64,6,1,5,5]-->求和-->[64,6]-->循环后-->[64,6,24,24]

According to the above ideas, the convolution operation can be divided into three basic stages, the operation done in each stage is the above 3 steps:

  1. Slice. The core operation is the slicing operation of Tensor.
  2. Dot product Dot. The core operation is the dot product operation before the tensor.
  3. Sum Sum. The core operation is the three-dimensional summation operation after the tensor.

experiment

In the experiment, I counted the average time of these three stages in each epoch.

Reference: https://github.com/Jintao-Huang/dnn_study/blob/master/torch_%E5%BA%95%E5%B1%82%E7%AE%97%E6%B3%95%E5%AE% 9E%E7%8E%B0/only_forward.py

My code: https://github.com/ytchx1999/CNN-Test

  • Experimental environment: cloud server + a Tesla T4 + PyTorch.
    • CPU
      model: Intel® Xeon® Gold 5218 CPU
      memory: 128G
      core: 64 core
    • GPU
      Tesla T4 *1
      Video memory: 16G
  • Data set: MNIST.
  • Timing unit: s.

This time the unit of time is seconds.

The execution time of each stage in conv2d

Execution time of each stage/s Slice Dot Sum
LeNet 0.4597 19.2618 27.4245

Insert picture description here

The first execution uses the built-in function, and the second execution uses the rewritten function. The training results of these 5 epochs are similar, which also proves the effectiveness of the rewritten conv2d function (except that the speed is a bit slow).

Guess you like

Origin blog.csdn.net/weixin_41650348/article/details/114836937