(Pytorch-Deep Learning Series) Detailed explanation of the calculation formula for the output size of the pytorch convolutional layer and the pooling layer

Detailed explanation of the calculation formula for the output size of the pytorch convolutional layer and the pooling layer

To design the structure of the convolutional neural network, it is necessary to match the size of the input and output between the layers, which requires a better calculation of the output size

List the formula first:

卷积后,池化后尺寸计算公式:
(图像尺寸-卷积核尺寸 + 2*填充值)/步长+1
(图像尺寸-池化窗尺寸 + 2*填充值)/步长+1
  • 1
  • 2
  • 3

which is:

卷积神将网络的计算公式为:
N=(W-F+2P)/S+1
其中
N:输出大小
W:输入大小
F:卷积核大小
P:填充值的大小
S:步长大小
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

Example Conv2d (an example will be given later to explain the calculation method):

Insert picture description here
`
Insert picture description here

class torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)
  • 1
卷积一层的几个参数:
in_channels=3:表示的是输入的通道数,RGB型的通道数是3.
out_channels:表示的是输出的通道数,设定输出通道数(这个是可以根据自己的需要来设置的)
kernel_size=12:表示卷积核的大小是12x12的,也就是上面的 F=12
stride=4:表示的是步长为4,也就是上面的S=4
padding=2:表示的是填充值的大小为2,也就是上面的P=2
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

Examples:

cove1d: used for text data, only the width is convolved, and the height is not convolved
cove2d: used for image data, the width and height are both convolved

import torch
from torch.autograd import Variable
#torch.autograd提供了类和函数用来对任意标量函数进行求导。
import torch.nn as nn
import torch.nn.functional as F
class MNISTConvNet(nn.Module):
    def __init__(self):
        super(MNISTConvNet, self).__init__()
        '''
这是对继承自父类的属性进行初始化。而且是用父类的初始化方法来初始化继承的属性。
也就是说,子类继承了父类的所有属性和方法,父类属性自然会用父类方法来进行初始化。
        '''
#定义网络结构
        self.conv1 = nn.Conv2d(1, 10, 5)
        self.pool1 = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(10, 20, 5)
        self.pool2 = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)
    def forward(self, input):
        x = self.pool1(F.relu(self.conv1(input)))
        x = self.pool2(F.relu(self.conv2(x))).view(320)
        x = self.fc2(self.fc1(x))
        return x

net = MNISTConvNet()
print(net)
input = Variable(torch.randn(1, 1, 28, 28))
out = net(input)
print(out.size())
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30

We extract the network structure part in this example:

        self.conv1 = nn.Conv2d(1, 10, 5)
        self.pool1 = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(10, 20, 5)
        self.pool2 = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)
    def forward(self, input):
        x = self.pool1(F.relu(self.conv1(input)))
        x = self.pool2(F.relu(self.conv2(x))).view(320)
        x = self.fc2(self.fc1(x))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

The network structure is:

conv2d--maxpool2d--conv2d--maxpool2d--fullyconnect--fullyconnect
  • 1

The input image size is: input = Variable(torch.randn(1, 1, 28, 28))
, which is a 28*28 single-channel image, that is:1*28*28

Next, we hierarchically analyze the input and output of each layer of the network:

(1)conv2d(1,10,5)

N:输出大小
W:输入大小	28*28
F:卷积核大小	5*5
P:填充值的大小	0默认值
S:步长大小	1默认值
  • 1
  • 2
  • 3
  • 4
  • 5
N=(W-F+2P)/S+1=(28-5 + 2*0)/1 + 1 = 24
输出为:10*24*24
  • 1
  • 2

Conv2d (number of input channels, number of output channels, kernel_size (length and width)). When the convolution kernel is square, only one can be written. When the convolution kernel is not square, the length and width must be written as follows:

self.conv1 = nn.Conv2d(2, 4, (5,2))
  • 1

(2) MaxPool2d(2, 2)
MaxPool maximum pooling layer, the role of the pooling layer in the convolutional neural network is feature fusion and dimensionality reduction. Pooling is also a similar convolution operation, but all parameters of the pooling layer are hyperparameters and cannot be learned. Maxpooling has local invariance and can extract significant features while reducing model parameters, thereby reducing model overfitting. Only the salient features are extracted, and the insignificant information is discarded. Yes, the parameters of the model are reduced, which can alleviate the occurrence of overfitting to a certain extent.

class torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
  • 1
N:输出大小
W:输入大小	24*24
F:卷积核大小	5*5
P:填充值的大小	0默认值
S:步长大小	1默认值
  • 1
  • 2
  • 3
  • 4
  • 5
N=(W-F+2P)/S+1=(24-2 + 2*0)/2 + 1 = 12
输出为:10*12*12
  • 1
  • 2

(3)conv2d(10,20,5)

N:输出大小
W:输入大小	12*12
F:卷积核大小	5*5
P:填充值的大小	0默认值
S:步长大小	1默认值
  • 1
  • 2
  • 3
  • 4
  • 5
N=(W-F+2P)/S+1=(12-5 + 2*0)/1 + 1 = 8
输出为:20*8*8
  • 1
  • 2

(4)MaxPool2d(2, 2)

N:输出大小
W:输入大小	8*8
F:卷积核大小	5*5
P:填充值的大小	0默认值
S:步长大小	1默认值
  • 1
  • 2
  • 3
  • 4
  • 5
N=(W-F+2P)/S+1=(8-2 + 2*0)/2 + 1 = 4
输出为:20*4*4
  • 1
  • 2

(5)fully-connect Linear(320, 50)

输入:20*4*4=320
输出:50
  • 1
  • 2

(6)fully-connect Linear(50, 10)

输入:50
输出:10
  • 1
  • 2

Therefore, the training process data flow of the entire example is:

    def forward(self, input):
        x = self.pool1(F.relu(self.conv1(input)))
        x = self.pool2(F.relu(self.conv2(x))).view(320)
        x = self.fc2(self.fc1(x))
  • 1
  • 2
  • 3
  • 4

The activation function Relu plays a role in the neural network: non-linear combination of weighted inputs produces non-linear decision boundary.
Simply put, it is to increase the non-linear effect.
The use of activation functions in deep convolutional neural networks also increases nonlinearity, mainly to solve the problem of gradient disappearance caused by the sigmoid function.

Guess you like

Origin blog.csdn.net/c2a2o2/article/details/109282583