Convolutional Neural Networks(2):Sparse Interactions, Receptive Field and Parameter Sharing

Sparse Interactions, Receptive Field and Parameter Sharing are the core parts of the entire CNN deep network, and we use this article to analyze its principles in detail.

 

First, we consider Feedforward Neural Network , the output matrix of the L layer is equal to the matrix multiplication of the input matrix of the L layer and the weight matrix of the L layer, and then performs nonlinear transformation. That is to say, each output data of the L layer has a relationship with each input data of the L layer. If the input data is m-dimensional and the output data is n-dimensional, there are m*n weight items to characterize the relationship between input and output. Therefore, the time complexity of Forward-propagation is O(m*n).

 

According to the actual experience in machine learning, the main problems caused by too much weight are: difficult to train, overfitting, etc. So CNN introduces Sparse Interactions to solve the problem of Dense Weight. Take a network with a kernel width of 3 as an example: the output S 3 of the next layer is only related to 3 inputs x 2 , x 3, and x 4 .

x 2 , x 3, x 4 is called the Receptive Field of s 3 , which is actually a concept imported from biological vision. Here is a quote from wikipedia about Receptive Field:

“Work by Hubel and Wiesel in the 1950s and 1960s showed that cat and monkey visual cortexes contain neurons that individually respond to small regions of the visual field. Provided the eyes are not moving, the region of visual space within which visual stimuli affect the firing of a single neuron is known as its receptive field. Neighboring cells have similar and overlapping receptive fields. Receptive field size and location varies systematically across the cortex to form a complete map of visual space. The cortex in each hemisphere represents the contralateral visual field.”

In the visual cortex (Visual Cortex) of the brain located in the occipital lobe (Occiptal Lobe), there are some neurons, each individual neuron corresponds to a Receptive Field. This is also where Biological Neural Network inspired Artificial Neural Network in image processing. If we just stop here, we have indeed reduced the complexity of the model and reduced the Weight Matrix to k*n, where k is the size of the kernel. That is to say, there are n neurons in the next layer, and each neuron is only related to the k values ​​of the previous layer.

 

However, the introduction of the concept of Weight Sharing further simplifies the model, so that the number of weights is only related to the size of the kernel. For Kernel and Weight Sharing, it can be understood as follows: There is no fixed connection line between the L layer and the L-1 layer, but dynamic binding is adopted. There is a small window between the two layers, called kernel. Through this window, you can see a small part of the original image, and as the window continues to slide from left to right, and from top to bottom, the entire image is scanned. The scanned image area is convolved with the kernel to generate a feature map. For the entire image, the shared weight is actually the value of the kernel. If the kernel changes, the scan result (feature map) of the entire image will change. The process is as follows:

 

 At this point, I personally think that the first W is what problem can be solved, but about Why and How, the follow-up still needs a lot of space and time, and learn and analyze slowly. This is the second of CNN, to be continued.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325281130&siteId=291194637