Convolutional Neural Network (CNN) Understanding

In the convolutional neural network, take an image as an example, for example, the image is m×m pixel size, then there will be m×m If there are n neurons in the first layer, if the fully connected method is used, then there are a total of m×m×n connection, that is, requires m×m×n parameters.

First we use sparse connections, that is, each neuron is only connected to a part of the image, such as k×k pixels, here k<m , temporarily regardless of the connection method, the number of connections and parameters will be reduced to k×k×n

Secondly, we use the second weapon, that is, weight sharing, then each of our neurons is connected to k×k The parameters on the pixels of the area are the same, so the number of connections is still k×k×n , but the parameter is reduced to k×k

Ok, so far we've used neurons to explain. Now imagine the neuron as a two-dimensional matrix, the only difference is that the original one-dimensional is changed to two-dimensional. And this shift just applies to the application of CNN to images. At this time, we also regard the previous parameters (also called weights) as a two-dimensional matrix, that is, a kernel . Use the convolution operation to operate. So corresponding to the previous, we said to connect only part of the image, k×k size, then we define a kernel, the size is also k×k , and then keep moving in this image to do the convolution operation, assuming that only one pixel is moved at a time (that is, the stride is 1), and finally the output side length is (mk)/1+1 The square matrix (image) of the size corresponds to the previous concept of neurons. In fact, the current number of neurons is not n , but ((mk)/1+1)2 . Note: The side length of the newly generated image (assuming that the image and the kernel are both equal in width and height) is (image s i zekernelsize ) / s t r i de+1

OK, at this time, only an image is output, that is, a feature map is obtained (we call the convolution operation in the above paragraph as a feature map). Usually in the image CNN, many feature maps will be generated. Suppose we generate M maps , that is, it is necessary to generate M side lengths of (mk)/1+1 A square matrix (image) of size, how many parameters are needed? As mentioned before, for a mapping, the number of our parameters is the size of the kernel, ie k×k , then the number of parameters required now is k×k×M

The above is the core concept of the convolutional layer in CNN. The ideas used include: sparse interaction , weight sharing , and multi-feature mapping .

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325721256&siteId=291194637