Convolution neural network basics

Construction convolutional neural network mainly includes three kinds of different network layers, respectively convolution layer, pooling layer and fully connected layer. Below will be interpreted for the three areas.

A convolution layer

Before introducing the convolution layer, we first need to understand the concept of the window. When we are in the network computing, we adopted a method of traversing through the window. We know that the picture is composed of pixels, the following is a 4 * 4 matrix, we first do not consider the case of three RGB channel. The following matrix means a channel.

 
1 2 1 1
4 5 1 1
1 1 1 1
1 1 1 11

 Then we assume that a window (filter), as a matrix of 3 * 3. This is a common window for detecting the vertical line.

1 0 -1
1 0 -1
1 0 -1

When the calculation, we take the number of the top right image matrix 9, respectively, digital multiplication cumulative window. The results here are 1-1-1-1 + 1 + 4 = 3; results will be expressed as the following matrix.

3 6
3 -6

Here we performed a simple calculation.

Next, we introduced a more complicated concept: filling padding and stride strides.

filling

Through the above calculation we have, we can see that the digital picture in the upper left corner of a matrix used once; and the number of such centers has spent 5 four times. To solve this problem, we introduce a method of filling. Padding around the original image. The padding = 1 case.

 

0 0 0 0 0 0
0 1 2 1 1 0
0 4 5 1 1 0
0 1 1 1 1 0
0 1 1 1 11 0
0 0 0 0 0 0

Common padding has two modes one is valid, that is not to deal with; the other is the same as the original matrix that is to make the picture exactly the same size matrix output. This time p = (f-1) / 2

 

Stride

We just moved to the right every time is a grid. Sometimes we do not want this, so the introduction of the concept of stride. I.e., number of cells per window movement. I.e., the above calculation is a stride.

Note

In fact, the above calculation is not mathematically convolution, but a cross-correlation. The real convolution takes about flipped upside down after the original picture matrix, and then enter the calculation. But this does not affect affect our neural networks, most neural networks are not concerned about this.

Convolution window layer is worth scrutiny, our example above, for detecting vertical edges, we can deduce the same expression level features like edges. Digital filter will exist as our actual parameters of the network.

Window size is usually an odd number, which on the one hand such that the window has a center; the other is to facilitate determining the size of the padding.

The above example of channel is a common picture RGB channel is 3, and the filter is the size of the channel of the picture window is the same.

Second, pooling layer

 

When the cell layer and the different layers of the convolution, it can be found more of the same features. Common pooling layer max and average two kinds. Pooling layer also has a window, but it is in the process of moving, we will calculate the max value or the average picture matrix.

Pooling layer stride and concepts equally filled. Not further elaborated here.

We assume that the maximum size of the filter (AMX) of the cell layer is 2 * 2, s = 2; then the results of our original image matrix is ​​calculated as:

5 1
1 11

 

三、 fully connected layer

This effect is similar to Flatten neural network layer, i.e., the high-dimensional images into one-dimensional vector, and then calculate the next calculation.

The high-dimensional vector into a one-dimensional image on many ways, one way is to use a convolutional layer is 1 * 1, another way is the direct digital flatten.

Fourth, the activation function

In fact, each of said processes are all activation function. When we get such a 2 * 2 matrix by convolution of layers, this number is multiplied as W and A and B plus a constant, then take relu activation function and the like, followed by the next level.

Fifth, why the convolution neural network effective?

Here there are two main reasons:

1. parameter sharing; we can see the filter parameters used to calculate the various parts of the picture, which is very effective to find similar features in the image.

2. Connect sparse; the output of each layer depends only on a very small number of parameters, which prevents over-fitting for the great help.

Guess you like

Origin www.cnblogs.com/siyuan-Jin/p/12391695.html