DataWhale team camp task05-1 punch learning neural network infrastructure convolution

Convolutional neural network infrastructure

Two-dimensional convolution layer of
this section is the most common two-dimensional convolution layer, commonly used in processing the image data.

A two-dimensional cross-correlation calculating
a two-dimensional cross-correlation (cross-correlation) operation input is a two-dimensional input array and a two-dimensional core (Kernel) array, the output is a two-dimensional array, wherein the array is generally referred to as core or core convolution filter (filter). Convolution kernel size generally smaller than the input array, the convolution kernel slides on the input array, at each location, the convolution kernel is multiplied by the input element subarray at the location and summed to give the corresponding position in the output array Elements. Figure 1 shows an example of the mutual correlation operation, a hatched portion respectively, the first input of the calculation region, and a corresponding array of core output.
Here Insert Picture Description
FIG mutual correlation operation of a two-dimensional
Below we achieve two-dimensional cross-correlation calculation corr2d function which takes input array X array and nuclear K, and outputs the array Y.

import torch 
import torch.nn as nn

def corr2d(X, K):
    H, W = X.shape
    h, w = K.shape
    Y = torch.zeros(H - h + 1, W - w + 1)
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i, j] = (X[i: i + h, j: j + w] * K).sum()
    return Y

X input array structure of the above figure, core array K to verify the output of the two-dimensional cross-correlation operation.

X = torch.tensor([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
K = torch.tensor([[0, 1], [2, 3]])
Y = corr2d(X, K)
print(Y)

tensor([[19., 25.],
[37., 43.]])

Layer two-dimensional convolution
dimensional convolution and the convolution of the input layer to make mutual correlation operation, and adding a scalar offset to obtain an output. Layer comprises a convolution model parameters and scalar offset convolution kernel.

class Conv2D(nn.Module):
    def __init__(self, kernel_size):
        super(Conv2D, self).__init__()
        self.weight = nn.Parameter(torch.randn(kernel_size))
        self.bias = nn.Parameter(torch.randn(1))

    def forward(self, x):
        return corr2d(x, self.weight) + self.bias

Let's look at an example, we construct a 6 image 8, intermediate 4 as a black (0), the remainder being white (1), the desired color of the edge is detected. Our label 6 is a two-dimensional array 7, the second column is 1 (from the edge of 1 to 0), the 6th column is -1 (the edge from 0 to 1).

X = torch.ones(6, 8)
Y = torch.zeros(6, 7)
X[:, 2: 6] = 0
Y[:, 1] = 1
Y[:, 5] = -1
print(X)
print(Y)

Here Insert Picture Description
We want to learn a laminate 1 * 2, an edge is detected by convolving the color layer.

conv2d = Conv2D(kernel_size=(1, 2))
step = 30
lr = 0.01
for i in range(step):
    Y_hat = conv2d(X)
    l = ((Y_hat - Y) ** 2).sum()
    l.backward()
    # 梯度下降
    conv2d.weight.data -= lr * conv2d.weight.grad
    conv2d.bias.data -= lr * conv2d.bias.grad
    
    # 梯度清零
    conv2d.weight.grad.zero_()
    conv2d.bias.grad.zero_()
    if (i + 1) % 5 == 0:
        print('Step %d, loss %.3f' % (i + 1, l.item()))
        
print(conv2d.weight.data)
print(conv2d.bias.data)

Step 5, loss 4.569
Step 10, loss 0.949
Step 15, loss 0.228
Step 20, loss 0.060
Step 25, loss 0.016
Step 30, loss 0.004
tensor([[ 1.0161, -1.0177]])
tensor([0.0009])

Calculating the cross-correlation convolution
convolutional layer is named after convolution, the convolution operation but not used in the convolution layer but mutual correlation operation. We'll nuclear array upside down, left and right flip, and then do the cross-correlation computing and input array, this process is the convolution operation. Since the core layer is a convolution of the array may be learned, and the correlation calculation using convolution is no essential difference so that mutual use.

FIG receptive field and wherein
a two-dimensional array of two-dimensional convolution of the output layer can be regarded as characterizing an input in a spatial dimension (width and height), also known as feature map (feature map). Effect element
before called receptive field x of x (receptive field) for all possible input region (which may be greater than the actual size of the input) is calculated.

1 as an example, four input elements are shaded hatched portion receptive field output element. We shape in FIG. 2 2 output denoted Y, Y-shape and another
2
make mutual correlation operation, output of a single core array element z 2. So, z in the Y-receptive field includes all four element Y, in the input receptive field which comprises all nine elements. We can see through deeper convolutional neural network characterized in that the individual elements of FIG receptive field becomes broader, wherein the input to capture a larger size.

And filling steps
we introduce two parameters super convolution layer, i.e., filling and stride, they may be given a shape and the shape of the convolution kernel input to change the output.

Filling
padding (padding) the input means in both height and width of padding element (typically 0 element), FIG. 2 where we were added to both sides of the element value 0 in the height and width of the original input.
Here Insert Picture Description
FIG 2 are filled in the element 0 on both sides of the height and width of the input two-dimensional cross-correlation calculation
if the original height and width are inputted n- H and n- W , height and width of the convolution kernel are K H and K W , in high total filling both p H rows, a total of both sides in a wide filled p w column, the shape of the output is:
Here Insert Picture Description
we use odd aspect of the convolutional neural network in the nucleus, such as 3 3,5 convolution kernel 5

Pace
in a mutual correlation operation, a convolution kernel on the input array slides, each slide the number of rows and columns that is the stride (stride). Previously, we stride are used 1, Figure 3 shows a high pace of the 3, 2 two-dimensional cross-correlation operation at a wide stride on.
Here Insert Picture Description
3 the height and width respectively stride FIG. 3 and 2-dimensional cross-correlation calculating
Generally, when the stride is a high s H , s is the width of the steps W , the shape of the output is:
Here Insert Picture Description
If P H = K H -1, P W = K W -1, then the output will be reduced to form
Here Insert Picture Description
further, if the input height and width can stride are divisible by the height and width, the shape of the output will be (n- H / S H ) * (n- W / S W ).
When P H = P W when = p, we call filled p; when S H = S W when = s, we call stride is s.

Multiple input channels and multiple output channels
of input and output before are two-dimensional array, but the real dimension of the data is often higher. For example, a color image on the outer dimensions of height and width as well as two RGB (red, green, blue) three color channels. Suppose color image height and width is w and h (pixels), respectively, then it can be represented as a 3 h multidimensional array w, the size of this dimension will be referred to as channel 3 (channel) dimension.

Multiple input channels
convolution layer may comprise a plurality of input channels, FIG. 4 shows an example having a two dimensional input channel cross-correlation calculation.
Here Insert Picture Description
Cross-correlation calculation 4 having two input channels of FIG
channels is assumed that the input data C I , the shape of a convolution kernel k H * k W , we assign each of a shape for each input channel k H * k W core array, the C I a cross-correlation calculation by two-dimensional output channel is added to obtain a two-dimensional array as an output. We C I cores dimensional array in the connecting channel, i.e., a shape obtained C I * K H * K W convolution kernel.

MIMO channel
convolution layer may also comprise a plurality of output channels, provided convolution kernel input channels and output channels are C I and C O , height and width are K H and K W . If you want to obtain an output having a plurality of channels, we can create for each output channel are shaped as C I * K H * K W core array, are connected to the output channel dimensions, i.e., the shape of the convolution kernel C O C * I * K H * K W .

For the convolution kernel output channels, we provide an understanding, a C I * K H * K W core array can extract some local features, the input may have a wealth of features, we need to have a plurality of such C I * K H * K W core array, an array of different cores are different feature extraction.

1x1 convolution layer
also discussed the shape of a convolution kernel 1 * 1, we usually call such a convolution operation is 1 * 1 convolution, convolution, said layer containing such a convolution kernel convolution layer is 1 * 1. Figure 5 shows the use of three input channels, output channels for the cross-correlation calculation * 1 1 2 convolution kernel.
Here Insert Picture Description
FIG 5 1x1 convolution kernel cross-correlation calculation. Input and output have the same height and width

1x1 convolution endorsed without changing the aspect, adjusting the number of channels. 1x1 convolution kernel mode does not recognize the height and width dimensions of the adjacent elements constituting its main dimension is calculated on the channel occurs. We assume the channel dimension as a feature dimension, the height and width of the element as the dimension data samples, then the effect of the full layer 1x1 convolution is equivalent to the connection layer.

Comparative convolution layers fully connected layers of
two-dimensional convolution processing image layer often used, compared with the previous layer is fully connected, it has two main advantages:

First layer fully connected to flatten into a vector image, the input image adjacent in the element no longer possible because adjacent flattening operation, difficult to capture local information network. Convolution layer design, naturally have the ability to extract local information.

Second, the amount of convolution of the parameters layer less. Without considering the offset, the shape of a (C I , C O parameter amount convolution kernel, h, w) is C I XC O xhxw, regardless of the width and height of the input image. If a shape of the input and output layer is a convolution (c, respectively . 1 , H . 1 , W . 1 ), and (c 2 , H 2 , W 2 ), if the connection to use the whole connection layer, the number of parameters is C . 1 XH . 1 XW . 1 XC 2 XH 2 XW 2 . Convolutional layer may be less the number of parameters to deal with larger image.

Simple convolution layer realize
that we use nn.Conv2d class Pytorch be implemented in two-dimensional convolution layer, the main focus on the following constructor parameters:

  • in_channels (python:int) – Number of channels in the input imag
  • out_channels (python:int) – Number of channels produced by the convolution
  • kernel_size (python:int or tuple) – Size of the convolving kernel
  • stride (python:int or tuple, optional) – Stride of the convolution. Default: 1
  • padding (python:int or tuple, optional) – Zero-padding added to both sides of the input. Default: 0
  • bias (bool, optional) – If True, adds a learnable bias to the output. Default: True

Parameter forward function as a four-dimensional tensor shape (N, C in , H in , W is in ), the return value is a four-dimensional tensor shape (N, C OUT , H OUT , W is OUT ), wherein N is the batch size, C, H, W represent the channel number, height, width.
Code explained

X = torch.rand(4, 2, 3, 5)
print(X.shape)

conv2d = nn.Conv2d(in_channels=2, out_channels=3, kernel_size=(3, 5), stride=1, padding=(1, 2))
Y = conv2d(X)
print('Y.shape: ', Y.shape)
print('weight.shape: ', conv2d.weight.shape)
print('bias.shape: ', conv2d.bias.shape)

torch.Size([4, 2, 3, 5])
Y.shape: torch.Size([4, 3, 3, 5])
weight.shape: torch.Size([3, 2, 3, 5])
bias.shape: torch.Size([3])

Pooling
the two-dimensional cell layer
cell layer is mainly used to relieve over-sensitive layer of the convolution position. With convolutional layer, the reservoir layer of the input data every time a fixed shape window (also known as pooling window) to calculate the output element, pooled layer directly calculate the maximum value or average value of the cell window elements, which operation also were called the largest pool or the average pooled. Figure 6 shows the shape of the window of a maximum pool of 2 x 2 of the pool.
Here Insert Picture Description
FIG 6 pooled shape of the window of a maximum pool of 2 x 2

The average pool of two-dimensional works with the largest pool of similar two-dimensional, but the biggest operator to replace the average operator. Pooling shape of the window is referred to as a cell layer pxq pxq cell layer, wherein the operation called pooling pxq pooled.

Cell layer and may also adjust the window moving stride highly filled and wide input to change the output sides of the shape. Pooling layer is filled with the working mechanism and pace convolution layer and filling the same stride.

When processing a multichannel input data, each cell layer on each input channel of the pool, but not as the result of each channel by channel summing image convolution layer. This means that the pool is equal to the output channel number of the layers and the number of input channels.

Simple achieve pooling layer
we use Pytorch in nn.MaxPool2d maximum pooling layer, following the constructor parameters:

  • kernel_size – the size of the window to take a max over
  • stride – the stride of the window. Default value is kernel_size
  • padding - implicit zero padding to be added on both sides
    parameter forward function as a four-dimensional tensor shape (N, C, H in , W is in ), the return value is a four-dimensional tensor shape (N, C, H OUT , W is OUT ), where N is the batch size, C, H, W represent the channel number, height, width.
    Code explained
X = torch.arange(32, dtype=torch.float32).view(1, 2, 4, 4)
pool2d = nn.MaxPool2d(kernel_size=3, padding=1, stride=(2, 1))
Y = pool2d(X)
print(X)
print(Y)

Here Insert Picture Description
The average cell layer using nn.AvgPool2d, using the same method nn.MaxPool2d.

Published 31 original articles · won praise 0 · Views 801

Guess you like

Origin blog.csdn.net/qq_44750620/article/details/104396602