Pooling

basic concepts

In image processing, because there is a lot of redundant information in the image, the statistical information of a certain area sub-block (such as the maximum value or mean, etc.) can be used to describe the spatial distribution pattern of all pixels in the area, instead of the area sub-block. All pixels in the block take values, which is the pooling operation in the convolutional neural network.

The pooling operation reduces the convolution result feature map to achieve downsampling while retaining the main information in the feature map. For example: when identifying whether an image is a human face, we need to know that there is an eye on the left side of the face and an eye on the right side, but we do not need to know the precise position of the eyes. At this time, we pool the pixels in a certain area. It can be useful to obtain overall statistical characteristics.

Several common methods of pooling include: average pooling, maximum pooling, and K-max pooling.

  • Average pooling : Calculate the mean of all pixels contained in the area sub-block, and use the mean as the average pooling result. As shown in (a) below, a pooling window of size 2×2 is used here, and the stride of each movement is 2. The pixels in the coverage area of ​​​​the pooling window are averaged to obtain the pixel value of the corresponding output feature map. The size of the pooling window is also called the pooling size, expressed as kh × kw k_h×k_wkh×kwexpress. Pooling with a window size of 2×2 and a stride of 2 is often used in convolutional neural networks.
  • Maximum pooling : Select the pixel with the largest value from a certain area sub-block of the input feature map as the maximum pooling result. As shown in (b) below, the maximum value of the pixels in the coverage area of ​​the pooling window is taken to obtain the pixel value of the output feature map. When the pooling window slides over the image, the entire output feature map will be obtained.

Insert image description here

  • K-max pooling : Taking the top K maximum values ​​of pixels in the regional sub-block of the input feature map, it is often used for text feature extraction in natural language processing. As shown in the figure below, the K-max pooling result is obtained by selecting the first 2 maximum values ​​from each column containing 4 values.

Insert image description here

Features

1. When the input data undergoes a small translation, most of the outputs after pooling remain unchanged. Therefore, pooling is robust to small position changes. For example, in the figure below, the input matrix is ​​shifted to the right by one pixel value. After using maximum pooling, the result remains unchanged from before the shift.

Insert image description here

2. Since the feature map will become smaller after pooling, if the fully connected layer is connected later, it can effectively reduce the number of neurons, save storage space and improve computing efficiency.

How to fill in pooling

int input

The int input receives a number n of type int, and wraps 0 in n rows and n columns around the image to fill the image. If you want to keep the image size unchanged, the value of n is related to the size of the pooling window. If H in 、 W in H_{in}、W_{in}HinWinEnter the size of the image, kh, kw k_h, k_wkh,kwis the size of the pooling window, H out , H out H_{out}, H_{out}Hout,HoutFor the size of the result graph, there is such a relationship between them. H out = H in + 2 ph − khsh + 1 W out = W out + 2 pw − kwsw + 1 H_{out} = \frac{H_{in} + 2p_h - k_h}{s_h} + 1 \ W_{out } = \frac{W_{out} + 2p_w -k_w}{s_w} + 1Hout=shHin+2phkh+1W _ out=swWout+2pwkw+1When using a 3×3 pooling window and a step size of 1, and to keep the image size unchanged, you need to use padding=1. Then, the formula becomesH out = 6 − 3 + 21 1 + 1 W out = 6 − 3 + 21 1 + 1 H_{out} = \frac{6 - 3 + 21}{1} + 1 \ W_{ out} = \frac{6 - 3 + 21}{1} + 1Hout=163+21+1W _ out=163+21+1In addition, when stride is not 1 and cannot be divided evenly, the overall result is rounded down. The formula for Padding and K is as follows:P adding = ( k − 1 ) 2 ( k % 2 ! = 0 ) Padding = \frac{(k-1)}{2} \quad \quad (k\%2 != 0 )Padding=2(k1)(k%2!=0 ) This is the result of no padding in pooling and stride of 1. It can be filled in according to the formula,H out = W out = ( 6 + 0 − 3 ) + 1 = 4 H_{out} = W_{out} = ( 6 + 0 - 3) + 1 = 4Hout=Wout=(6+03)+1=4 , so the result after pooling is 4. What if it is filled?

As in our formula above, H out = W out = ( 6 + 2 − 3 ) + 1 = 6 H_{out} = W_{out} = (6 + 2 - 3) + 1 = 6Hout=Wout=(6+23)+1=6 , when the padding is 1, the image remains at its original size.

list and tuple input

Because the image has width and height, the length of the list and tuple should be 2. The two values ​​inside correspond to the height and width respectively. The calculation method is the same as the int input above, and is calculated separately. It is generally used when the width and height of the input image are inconsistent, or the size of the pooling window is different.

string input

The string input has two values, one is SAME and the other is VALID. The calculation formulas for these two are as follows:

SAME: H o u t = ⌈ H i n s h ⌉ H_{out} = \lceil \frac{H_{in}}{s_h} \rceil Hout=shHin, W o u t = ⌈ W i n s w ⌉ W_{out} = \lceil\frac{W_{in}}{s_w}\rceil Wout=swWin

VALID: H o u t = H i n − k h s h + 1 H_{out} = \frac{H_{in} - k_h}{s_h} + 1 Hout=shHinkh+1, W o u t = W i n − k w s w + 1 W_{out} = \frac{W_{in} - k_w}{s_w} + 1 Wout=swWinkw+1

As you can see, the VALID method is the default non-padding method, which is the same as the non-Padding formula above. SAME has nothing to do with the size of the pooling window. If sh s_hshand sw s_wswis 1, no matter the size of the pooling window, the size of the output feature map remains consistent with the original image. When any one is greater than 1, if it is divisible, the output size is the result of integer division. If it is not divisible, continue to round up through padding.

Guess you like

Origin blog.csdn.net/weixin_49346755/article/details/127486036