3D Convolution Study Notes

What is the difference between a 2D convolution kernel and a 3D convolution kernel?

What is the difference between a two-dimensional convolution kernel and a three-dimensional convolution kernel in a convolutional neural network? - Know almost

License plate recognition uses 3d convolution:

LPRNet_Pytorch/LPRNet.py at master · sirius-ai/LPRNet_Pytorch · GitHub

Detailed explanation of 3d convolution:

Detailed explanation of 3D convolution - Programmer Sought

f = nn.Conv3d(16, 33, 3, stride=2)
f = nn.Conv3d(2, 3, (2, 3, 3), stride=(1, 1, 1), padding=(0, 0, 0))
i = torch.Tensor(np.array(range(0,200)).reshape(1,2,4,5,5))
a = torch.Tensor(range(0,108)).reshape(3,2,2,3,3)

f.weight = torch.nn.Parameter(a)
output = f(i) #1x3x3x3x3



y01221 = torch.sum(a[1,:,:,:,:]*i[0,:,2:4,2:5,1:4])
g01221 = output[0,1,2,2,1]#1x3x3x3x3

As can be seen from the above code:

  1. The dimension of input i is (1, 2, 4, 5, 5), which means batch=1, feature=2, depth=4, h=5, w=5.
  2. The dimension of the filter f is (3, 2, 2, 3, 3):
  • The first number 3 of the dimension is determined by the second parameter of Conv3d, indicating that there are 3 (2,3,3) filters,
  • The second number 2 of the dimension is determined by the first parameter of Conv3d, which must be the same as the feature dimension of the input i. Here it means the same meaning as Conv2D. For example, in 2d, the size of the convolution is 3x3, and the input feature dimension is 5 , then the default convolution dimension is (5,3,3), because it needs to correspond to the input feature dimension.
  • The remaining three parameters have the same meaning, but the dimensions are different. They are to scan 2 grids at a time in the depth dimension, 3 grids at a time in the h dimension, and 3 grids at a time in the w dimension.

y01221 = torch.sum(a[1,:,:,:,:]*i[0,:,2:4,2:5,1:4])
g01221 = output[0,1,2,2,1]#1x3x3x3x3

According to the output output[0,1,2,2,1], the inversion is the part of the input and the filter response, output.size[1]=1, indicating that this is the second 3D filter, so the first part of a Fill in 1 to get the value of the second 3D filter. Next, we need to take out the input value. From output.size(2)=2, we can know that the 3D filter slides to the third stop in the depth dimension (it slides like this: the first slide takes out 0:2, the first slide The second time is 1:3, the third time is 2:4), and the values ​​of all feature dimensions are to be taken out, that is, concatenate((i[0,0,2:4,2:5,1:4], i[0] ,1,2:4,2:5,1:4]),0)=i[0,:,2:4,2:5,1:4]. The same operation is also performed in the h dimension and the w dimension, h The dimension slides to the 3rd stop (2:5), and the w dimension slides to the 2nd stop (1:4, because a slide of 5 squares).

Guess you like

Origin blog.csdn.net/jacke121/article/details/123675306