reference
5.4 Pooling layer
In this section we introduce the pooling layer, which is proposed to alleviate the excessive sensitivity of the convolutional layer to position.
5.4.1 Two-dimensional maximum pooling layer and average pooling layer
The pooling layer directly calculates the maximum or average value of the elements in the pooling window. This operation is also called the maximum pooling layer or the average pooling layer.
Next, the forward calculation of the pooling layer is implemented in the pool2d function.
import torch
import torch.nn as nn
def pool2d(X, pool_size, mode="max"):
X = X.float()
p_h, p_w = pool_size
Y = torch.zeros(X.shape[0] - p_h + 1, X.shape[1] - p_w + 1)
for i in range(Y.shape[0]):
for j in range(Y.shape[1]):
if mode == 'max':
Y[i, j] = X[i: i + p_h, j: j + p_w].max()
elif mode == 'avg':
Y[i, j] = X[i: i + p_h, j: j + p_w].mean()
return Y
X = torch.tensor([[0,1,2], [3,4,5], [6,7,8]])
pool2d(X, (2, 2))
Let's verify the average pooling layer
pool2d(X, (2,2),'avg')
5.4.2 Padding and stride
The pooling layer can also define fill and stride
X = torch.arange(16, dtype=torch.float).view((1, 1, 4, 4))
X
By default, the stride in the MaxPool2d instance has the same shape as the pooling window. The pooling window with shape (3, 3) is used below, and the stride with shape (3, 3) is obtained by default.
pool2d = nn.MaxPool2d(3)
pool2d(X)
We can manually specify the stride and padding.
pool2d = nn.MaxPool2d(3, padding=1, stride=2)
pool2d(X)
Of course, we can also specify a non-square pooling window, and specify the padding and stride on the height and width respectively.
pool2d = nn.MaxPool2d((2, 4), padding=(1, 2), stride=(2, 3))
pool2d(X)
5.4.3 Multi-channel
The pooling layer pools each input channel separately, instead of adding the inputs of each channel by channel like the convolutional layer.
X = torch.cat((X, X + 1), dim=1)
X
After pooling, we find that the number of output channels is still 2.