Study Notes 1 - Size Calculation of Convolution Kernel and Feature Map


foreword

  Because the postgraduate direction is image processing, I opened a special topic to record my own learning process. Xiaobai is new to contact, please don’t spray if you make mistakes, welcome to discuss


1. Convolution

  The essence of convolution is filtering . Through filtering, information of interest can be obtained, as can be seen through some examples below.

1. One-dimensional convolution

  "Signal and System" talks about one-dimensional convolution, and the calculation formula is:

It can be seen that the convolution of two (one-dimensional) signals is the area where a signal h(t) flips and slides from left to right, overlapping with the signal f(t) , as shown in the figure below:

2. Two-dimensional convolution

  One-dimensional convolution is not easy to see the essence of convolution - "filtering", but two-dimensional convolution can be seen very well, as shown in the following figure :

The size of the convolution kernel is 3x3, slide on the image (correlation is similar to convolution, but there is a small difference, flip the convolution kernel up and down, left and right, and then do related operations along the sliding direction, the obtained results are directly related to the operation The result is the same), select a specific convolution kernel, you can extract certain features, such as the sharpening filter convolution kernel in the figure below:

It can highlight the details of the image. The calculation formula is: the middle pixel value x 9 - the sum of the difference values ​​​​of the surrounding 8 pixels, that is to say, the greater the difference, the more prominent the details after filtering

Second, the size calculation of the feature map

  The output of each layer of the network is called a feature map . As can be seen from the schematic diagram of two-dimensional convolution, convolution will change the size of the image. The following figure is an example. The convolution kernel size is 3x3, and the image size (blank square) is 7x7. , the step size of the convolution kernel moves s = 2, the filling part p = 1, the center of the convolution kernel is aligned with the upper leftmost pixel, and starts to slide, horizontal sliding can get 4 "pixels", vertical sliding can get 4 " Pixel", so the feature map size is 4x4

The calculation formula is (the brackets in the formula mean rounding down):

Substituting the data in the figure into the formula is also the result. If you encounter a situation where the length and width of the image are different, just calculate the size of the two directions separately

Guess you like

Origin blog.csdn.net/weixin_45067190/article/details/126155496