Depth hands-on science learning - convolution neural network 1

Here Insert Picture Description
White whore Bo Yu platform, or to thank about, playing wave advertising.
Mainly study notes, problem-solving can be ignored in this blog, so as not to waste time and effort

Convolutional neural network infrastructure

It describes the basic concept of convolutional neural network, and the cell layer is the convolution of the main layer, and interpret the meaning of the filling, stride, the input channels and output channels.

Two-dimensional cross-correlation operation

The official explanation: a two-dimensional cross-correlation (cross-correlation) operation input is a two-dimensional input array and a two-dimensional core (Kernel) array, the output is a two-dimensional array, wherein the core array is typically referred to as convolution or filter (filter). Convolution kernel size generally smaller than the input array, the convolution kernel slides on the input array, at each location, the convolution kernel is multiplied by the input element subarray at the location and summed to give the corresponding position in the output array Elements. Figure 1 shows an example of the mutual correlation operation, a hatched portion respectively, the first input of the calculation region, and a corresponding array of core output.

figure 1
Popular personal understanding that: the input array covering the convolution kernel performs point multiplication attention is not the matrix operation , the above figure is an example computing process 0 0 +. 1 . 1. 3 + 2 +. 4 . 3. 19 =
Pytorch function can be used corr2d achieve a two-dimensional cross-correlation operation, it receives an input X and an array of core array K, and outputs the array Y.

import torch 
import torch.nn as nn

def corr2d(X, K):
	# X代表输入数组,K代表卷积核
	# H/h 行,W/w代表列
    H, W = X.shape
    h, w = K.shape
    Y = torch.zeros(H - h + 1, W - w + 1)
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i, j] = (X[i: i + h, j: j + w] * K).sum()
    return Y

The figure above configuration input array X, a two-dimensional core array K to verify the operation of the cross-correlation output

X = torch.tensor([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
K = torch.tensor([[0, 1], [2, 3]])
Y = corr2d(X, K)
print(Y)

Output tensor ([[19., 25.], [37., 43.]])

Layer two-dimensional convolution

And two-dimensional convolution of the input layer to make the cross-correlation convolution operation, and adding a scalar offset to obtain an output. Layer comprises a convolution model parameters and scalar offset convolution kernel.

class Conv2D(nn.Module):
    def __init__(self, kernel_size):
        super(Conv2D, self).__init__()
        self.weight = nn.Parameter(torch.randn(kernel_size))
        self.bias = nn.Parameter(torch.randn(1))

    def forward(self, x):
        return corr2d(x, self.weight) + self.bias

Let's look at an example, we construct an image of a 6 × 8, intermediate 4 as black (0), the remainder being white (1), the desired color of the edge is detected. Our label is a two-dimensional array of 6 × 7, the second column is 1 (from the edge of 1 to 0), the sixth column is -1 (the edge from 0 to 1).

X = torch.ones(6, 8)
Y = torch.zeros(6, 7)
X[:, 2: 6] = 0
Y[:, 1] = 1
Y[:, 5] = -1
print(X)
print(Y)

Output
Tensor ([[. 1., 1., of 0. The, of 0. The, of 0. The, of 0. The, 1., 1.],
[1., 1., of 0. The, of 0. The, of 0. The, of 0. The, 1, 1],
[1, 1, 0.5, 0.5, 0.5, 0.5, 1, 1],
[1, 1, 0.5, 0.5, 0.5, 0 ., 1, 1],
[1, 1, 0.5, 0.5, 0.5, 0.5, 1, 1],
[1, 1, 0.5, 0., 0. , of 0. The, 1., 1.]])
Tensor ([[of 0. The, 1., of 0. The, of 0. The, of 0. The, -1., of 0. The],
[of 0. The, 1., of 0. The, 0 ., 0., -1, 0.5],
[0.5, 1, 0.5, 0.5, 0.5, -1, 0.5],
[0.5, 1, 0.5, 0.5, 0.5, -1, 0.5],
[0.5, 1, 0.5, 0.5, 0.5, -1, 0.5],
[0.5, 1, 0.5, 0., 0. , -1, 0.5]])
wish to learn who want to learn a 1 × 2 convolutional layer, an edge is detected by convolving the color layer.

conv2d = Conv2D(kernel_size=(1, 2))
step = 30  #周期
lr = 0.01 # 学习率
for i in range(step):
    Y_hat = conv2d(X)
    l = ((Y_hat - Y) ** 2).sum()
    l.backward()
    # 梯度下降
    conv2d.weight.data -= lr * conv2d.weight.grad
    conv2d.bias.data -= lr * conv2d.bias.grad
    
    # 梯度清零
    conv2d.weight.grad.zero_()
    conv2d.bias.grad.zero_()
    if (i + 1) % 5 == 0:
        print('Step %d, loss %.3f' % (i + 1, l.item()))
        
print(conv2d.weight.data)
print(conv2d.bias.data)

运算结果
Step 5, loss 4.569
Step 10, loss 0.949
Step 15, loss 0.228
Step 20, loss 0.060
Step 25, loss 0.016
Step 30, loss 0.004
tensor([[ 1.0161, -1.0177]])
tensor([0.0009])

Filling and stride

Ultra two parameters introduced convolution layer, i.e., filling and stride, they may be given a shape and the shape of the convolution kernel input to change the output.

filling

Padding (padding) the input means in both height and width of padding element (typically 0 element), FIG. 2 where we were added to both sides of the element value 0 in the height and width of the original input.
Here Insert Picture Description
2 height and width of the input sides, respectively, of FIG filled with 0-dimensional cross-correlation calculation element

If the original height and width are input nh and NW, and high convolution kernel width is kh kW and total filling line at a higher ph sides, filled total width pw columns on both sides, the shape of output:

(Nh + ph-KH + 1) × (hometown + Sleep-kw + 1)

We use convolutional neural network odd aspect of the cores, such as convolution kernels 3 × 3, 5 × 5, the height (or width) of core size 2k + 1, so steps 1, high (or width) on both sides of the selected size to fill k, input and output can remain the same size.
Briefly filler is filled around the input array elements developed, e.g. padding = 1 then the line to be filled up and down all elements 0

Stride

Cross-correlation calculation, the convolution kernel on the input array slides, each slide the number of rows and columns that is the stride (stride). Previously, we stride are used 1, Figure 3 shows a high pace of the 3, 2 two-dimensional cross-correlation operation at a wide stride on.
Also lateral movement of each cell 2, cell 3 each move longitudinally
Here Insert Picture Description
FIGS. 3 and high over a wide stride, respectively 3 and 2-dimensional cross-correlation calculating

In general, when is sh, the stride width sw of the high pace, the shape of output:

⌊(nh+ph−kh+sh)/sh⌋×⌊(nw+pw−kw+sw)/sw⌋

If ph = kh-1, pw = kw-1, then the output will be reduced to the shape ⌊ (nh + sh-1) / sh⌋ × ⌊ (nw + sw-1) / sw⌋. Still further, if the input height and width can stride are divisible by the height and width, the output will be a shape (nh / sh) × (nw / sw).

When ph = pw = p, we call for the filling p; When sh = sw = s, we call stride is s.

There is a universal formula (H-F. + 2P) / S + 1'd
H input array size, F convolution kernel size, P array is filled, S represents the moving step plus an offset

Multiple input channels and multiple output channels

Prior to the inputs and outputs are two-dimensional array, but the real dimension of the data is often higher. For example, a color image on the outer dimensions of height and width as well as two RGB (red, green, blue) three color channels. Height and width assuming a color image is h and w (pixels), respectively, then it can be represented as a 3 × h × w multidimensional arrays, the size of this dimension will be referred to as channel 3 (channel) dimension.

Multiple-input single-output channel

Convolution of the input layer may comprise a plurality of channels, FIG. 4 shows an example having a two dimensional input channel cross-correlation calculation.
Here Insert Picture Description
Suppose the number of input channels is ci, convolution kernel shape kh × kw, we assign each of a shape for each input channel of kh × kw core array, the two-dimensional output ci a cross-correlation calculation by adding the channel to obtain a two-dimensional array as an output. We ci dimensional array cores in the connecting channel, i.e., a shape obtained ci × kh × kw convolution kernel.

Multiple input and multiple output channels

The output of the convolutional layer may also comprise a plurality of channels, the number of channels provided convolution kernel input and output channels of ci and co, height and width, respectively, and kh kw, respectively. If you want to obtain an output having a plurality of channels, we can create for each output channel are shaped as ci × kh × kw core array, the dimension thereof connected to the output channel, i.e. the shape of the convolution kernel co × ci × kh × kw.

For the convolution kernel output channels, we provide an understanding, a ci × kh × kw core array can extract some local features, the input may have a wealth of features, we need to have a plurality of such ci × kh × kw core array, an array of different cores are different feature extraction.

1X1 convolution layer

Finally the shape of the convolution kernel of 1 × 1, we usually call this convolution convolution of 1 × 1, said layer comprising a convolution kernel of this convolution convolution layer is 1 × 1. FIG. 5 shows the cross-correlation is calculated using the number of input channels 3 and output channels 2 of 1 × 1 convolution kernel.

Here Insert Picture Description
FIG 5 1x1 convolution kernel cross-correlation calculation. Input and output have the same height and width

1 × 1 approved convolution without changing the aspect, adjusting the number of channels. 1 × 1 high and does not recognize the convolution kernel constituted of pattern elements adjacent wider dimension, which is calculated mainly occurs on the channel dimensions. We assume the channel dimension as a feature dimension, the height and width of the element as the dimension data samples, 1 × 1 then the action of the full convolution layer connection layer equivalents.

Comparison with the full convolution layer connection layer

Two-dimensional convolution layer is often used to process the image, compared with the previous full link layer, it has two main advantages:

First layer fully connected to flatten into a vector image, the input image adjacent in the element no longer possible because adjacent flattening operation, difficult to capture local information network. Convolution layer design, naturally have the ability to extract local information.

Second, the amount of convolution of the parameters layer less. Convolution kernel parameter amount without considering the offset, a shape (ci, co, h, w) is ci × co × h × w, regardless of the width and height of the input image. If a shape of the input and output layers are convolution (c1, h1, w1) and (c2, h2, w2), if the connection to use the whole connection layer, the number of parameters is c1 × c2 × h1 × w1 × h2 × w2. Convolutional layer may be less the number of parameters to deal with larger image.

Simple convolution layer implementation

pytorch 中nn.Conv2d 可以简单实现 主要注意下面几个参数
in_channels (python:int) – Number of channels in the input imag
out_channels (python:int) – Number of channels produced by the convolution
kernel_size (python:int or tuple) – Size of the convolving kernel
stride (python:int or tuple, optional) – Stride of the convolution. Default: 1
padding (python:int or tuple, optional) – Zero-padding added to both sides of the input. Default: 0
bias (bool, optional) – If True, adds a learnable bias to the output. Default: True

Parameter forward function as a four-dimensional tensor shape (N, Cin, Hin, Win), the return value is a four-dimensional tensor shape (N, Cout, Hout, Wout), where N is the batch size, C, H, W represent the channel number, height, width.

X = torch.rand(4, 2, 3, 5)
print(X.shape)

conv2d = nn.Conv2d(in_channels=2, out_channels=3, kernel_size=(3, 5), stride=1, padding=(1, 2))
Y = conv2d(X)
print('Y.shape: ', Y.shape)
print('weight.shape: ', conv2d.weight.shape)
print('bias.shape: ', conv2d.bias.shape)

输出结果
torch.Size([4, 2, 3, 5])
Y.shape: torch.Size([4, 3, 3, 5])
weight.shape: torch.Size([3, 2, 3, 5])
bias.shape: torch.Size([3])

Pooling

Pooling layer is mainly used to relieve over-sensitive layer of the convolution position. With convolutional layer, the reservoir layer of the input data every time a fixed shape window (also known as pooling window) to calculate the output element, pooled layer directly calculate the maximum value or average value of the cell window elements, which operation also were called the largest pool or the average pooled. Figure 6 shows the shape of the pool of 2 × 2 window maximum pooled.
Here Insert Picture Description
Pooling maximum average layer pooling and pooling, a simple average or maximum point talk is input array of nuclear pool cover.

Simple pooling layer implementation

We use Pytorch in nn.MaxPool2d maximum pooling layer, following the constructor parameters:

kernel_size - The size of The window to Take A max over
a stride of -. The a stride of of The window the Default value IS kernel_size
padding - Implicit ZERO padding to BE added ON both Sides
parameters forward function as a four-dimensional tensor shape (N, C, Hin, Win), the return value is a four-dimensional tensor shape (N, C, Hout, Wout ), where N is the batch size, C, H, W represent the channel number, height, width.

X = torch.arange(32, dtype=torch.float32).view(1, 2, 4, 4)
pool2d = nn.MaxPool2d(kernel_size=3, padding=1, stride=(2, 1))
Y = pool2d(X)
print(X)
print(Y)

Output
Tensor ([[[[of 0. The, 1., 2., 3.],
[4., 5. The, 6. The, 7. The],
[8. The, 9. The, 10. The, 11. The],
[ 12., 13., 14., 15.]],

     [[16., 17., 18., 19.],
      [20., 21., 22., 23.],
      [24., 25., 26., 27.],
      [28., 29., 30., 31.]]]])

tensor([[[[ 5., 6., 7., 7.],
[13., 14., 15., 15.]],

     [[21., 22., 23., 23.],
      [29., 30., 31., 31.]]]])

The average cell layer using nn.AvgPool2d, using the same method nn.MaxPool2d

Published 12 original articles · won praise 0 · Views 268

Guess you like

Origin blog.csdn.net/inventertom/article/details/104632775