Deep Learning Basic CNN Series - Convolution Computing

Convolution calculation

Convolution is a method of integral transformation in mathematical analysis, and the discrete form of convolution is used in image processing. What needs to be explained here is that in the convolutional neural network, the implementation of the convolutional layer is actually a cross-correlation operation defined in mathematics , which is different from the definition of convolution in mathematical analysis. Here is the same as other The framework is consistent with the convolutional neural network tutorial, and both use cross-correlation operations as the definition of convolution. The specific calculation process is shown in the figure.
insert image description here

Cross-correlation calculation

Although the convolution layer gets its name from the convolution operation, we usually use the more intuitive cross-correlation operation in the convolution layer . In a two-dimensional convolutional layer, a two-dimensional input array and a two-dimensional kernel (kernel) array output a two-dimensional array through a cross-correlation operation. We use a concrete example to explain the meaning of the two-dimensional cross-correlation operation. As shown in FIG.

The convolution kernel (kernel) is also called a filter (filter) , assuming that the height and width of the convolution kernel are kh k_hkhand kw k_wkw, then it will be called kh × kw k_h \times k_wkh×kwConvolution, such as 3 × 5 3 \times 53×5 convolution means that the convolution kernel has a height of 3 and a width of 5.


  • As shown in the above picture (a): the size of the picture on the left is 3 × 3 3 \times 33×3 , indicating that the input data is a dimension of3 × 3 3\times33×A two-dimensional array of 3 ; the size of the middle image is2 × 2 2\times22×2 , means a dimension is2 × 2 2\times22×A two-dimensional array of 2 , we call this two-dimensional array a convolution kernel. First align the upper left corner of the convolution kernel with the upper left corner of the input data (ie: the (0,0) position of the input data), and multiply each element of the convolution kernel by the element in the input data whose position corresponds to it , and then add the convolution to get the first result of the convolution output:

                   0 × 1 + 1 × 2 + 2 × 4 + 3 × 5 = 25 0\times1 + 1\times2 + 2\times4 + 3\times5 = 25 0×1+1×2+2×4+3×5=25  (a)

The calculation methods of (b), (c) and (d) in the figure are the same as above, I believe that smart people don’t need me to demonstrate more.

The calculation process of the convolution kernel can be expressed by the following mathematical formula, where aaa represents the input image,bbb represents the output feature map,www is the convolution kernel parameter, they are all two-dimensional arrays.

                       b [ i , j ] = ∑ u , v a [ i + u , j + v ] ⋅ w [ u , v ] b[i,j] =\displaystyle \sum_{u,v}a[i + u, j+v]\cdot w[u,v] b[i,j]=u,va[i+u,j+v]w[u,v]

For example, the size of the convolution kernel in the above figure is 2 × 2 2\times22×2,则uuu can take 0 and 1,vvv can also take 0 and 1, that is to say:

   b [ i , j ] = a [ i + 0 , j + 0 ] ⋅ w [ 0 , 0 ] + a [ i + 0 , j + 1 ] ⋅ w [ 0 , 1 ] + a [ i + 1 , j + 0 ] ⋅ w [ 1 , 0 ] + a [ i + 1 , j + 1 ] ⋅ w [ 1 , 1 ] b[i,j] = a[i+0,j+0]\cdot w[0,0] + a[i+0,j+1]\cdot w[0,1] + a[i+1,j+0]\cdot w[1,0] +a[i+1,j+1]\cdot w[1,1] b[i,j]=a[i+0,j+0]w[0,0]+a[i+0,j+1]w[0,1]+a[i+1,j+0]w[1,0]+a[i+1,j+1]w[1,1]

We can verify its correctness, when [ i , j ] [i,j][i,j ] take different values, whether the result calculated according to this formula is consistent with the example in the above figure.

Supplement:
In a convolutional neural network, in addition to the convolution process described above, a convolution operator also includes the operation of adding a bias term. For example, assuming that the bias is 1, the result of the above convolution calculation is:
0 × 1 + 1 × 2 + 2 × 4 + 3 × 5 + 1 = 26 0\times1+1\times2+2\times4+3\times5 +1=260×1+1×2+2×4+3×5+1=26

0 × 2 + 1 × 3 + 2 × 5 + 3 × 6 + 1 = 32 0\times2+1\times3+2\times5+3\times6 +1=32 0×2+1×3+2×5+3×6+1=32

0 × 4 + 1 × 5 + 2 × 7 + 3 × 8 + 1 = 44 0\times4+1\times5+2\times7+3\times8 +1=44 0×4+1×5+2×7+3×8+1=44

0 × 5 + 1 × 6 + 2 × 8 + 3 × 9 + 1 = 50 0\times5+1\times6+2\times8+3\times9 +1=50 0×5+1×6+2×8+3×9+1=50

practise

After learning the knowledge, let's do a question for the next exercise to help us fully grasp the operation of convolution.
Title: Calculate how many multiplication and addition operations there are in the convolution. The
input data shape is [ 10 , 3 , 224 , 224 ] [10,3,224,224][10,3,224,224 ] , convolution kernelkh = kw = 3 k_h = k_w = 3kh=kw=3 , the number of output channels is 64,stride = 1 stride=1stride=1 , fillph = pw = 1 p_h=p_w=1ph=pw=1 .
Then to complete such a convolution, how many multiplication and addition operations do you need to do in total?

  • Tips
    First look at how many multiplication and addition operations need to be done to output a pixel, and then calculate the total number of operations required.

Question steps:

  1. First consider the two-dimensional convolution when there is only one input channel:
    Assuming that the output is B and the input is A, first calculate a pixel of B,
    insert image description here
    among which there are a total of 9 multiplications and 8 additions.
    But generally the pictures we input are RGB three-channel, so we need to calculate each channel B 00 ( c = 0 ), B 00 ( c = 1 ), B 00 ( c = 2 ) B^{(c=0 )}_{00},B^{(c=1)}_{00},B^{(c=2)}_{00}B00(c=0),B00(c=1),B00(c=2), the total number of multiplication operations is 3 × 9 = 27 3\times9=273×9=27 , the number of addition operations is3 × 8 = 24 3\times8=243×8=24 times.

  2. Then add the values ​​​​of these input channels, and add the bias parameter bbb
    B 00 = B 00 ( c = 0 ) + B 00 ( c = 1 ) + B 00 ( c = 2 ) + b B_{00}=B^{(c=0)}_{00}+B^{(c=1)}_{00}+B^{(c=2)}_{00}+b B00=B00(c=0)+B00(c=1)+B00(c=2)+b
    Since 3 additional addition operations need to be introduced, the final total number of addition operations is24 + 3 = 27 24+3=2724+3=27
    From this, it can be obtained that the number of multiplication operations required to calculate a pixel point is 27, and the number of addition operations is also 27.

  3. The size of the output feature map is [10, 64, 224, 224][10,64,224,224][10,64,224,224 ] , the total number of multiplication operations required is:
    27 × 10 × 64 × 224 × 224 = 867041280 27\times10\times64\times224\times224=86704128027×10×64×224×224=867041280
    The number of addition operations is the same as the number of multiplication operations is 867041280.

Guess you like

Origin blog.csdn.net/m0_63007797/article/details/128714136