CNN cell layer convolution layer backpropagation

Reference: https: //blog.csdn.net/kyang624823/article/details/78633897

Convolution layer cell layer back propagation:

1, CNN forward propagation

a) For the layer convolution, convolution kernel matrix corresponding to the position of the input quadrature and then summing, as the output matrix corresponding to the value of the location. If the input is M * N matrix inputX size, convolution kernel size for the a * b, the output Y is (M-a + 1) * (N-b + 1) size. 
Write pictures described here 

b) For the cell layer, in accordance with the standard cell of input tensor reduced. 
c) For the fully connected layers, prior calculated according to the general propagation network.

2, except CNN backpropagation:

First it is to be noted that the general output of a neural network, the input of each layer, z is just a vector, and in a CNN, z is a three-dimensional tensor, that is, from the plurality of input sub-matrices. Second:

  1. Cell function is not active layer. Rather, the problem is better to solve this, we can make the activation function cell layer of σ (z) = z, that is, after activation is itself. Such derivative cell layer 1 activation function.
  2. Cell layer when the forward propagation, the input is compressed, then we derive the forward reverse the upper layer errors, upsample processing needs to be done.
  3. Tensor convolution by the convolution layer, or a plurality of output matrix convolution summation current layer is obtained, which direct and general network different matrix multiplication to obtain an output current layer. So that when the convolution layer backpropagation, a layer of recursive calculation errors necessarily differ.
  4. For convolutional layer, since the convolution operation using the W, then an error is derived from this layer of all of the convolution kernel layer W, B of different ways.

Since the layer may have a plurality of convolution convolution kernel convolution kernel processing method are identical and independent, in order to simplify the complexity of the algorithm formula, we mention the following convolution kernel convolution are several convolution layer nuclear one. Next, look at the CNN particular backpropagation step.

3, the error is known pooled layer, a hidden error in the derivation of the reverse layer

When the forward propagation, we will pool layer pooled, the pool of the known size of the input area or the MAX Average. Now we, in turn, reduce the error from the region, a large area of ​​the front layer of error reduction. This process is called upsample. We assume that the size of the area of ​​the pool is 2x2. L of the k-th layer error matrix: 

If the pool region is represented as a * a size, then we extend the upper and lower left and right of the matrix rows and columns a-1 reduction: 
Write pictures described here 

Pooling layer backpropagation need to ensure delivery Loss (or gradient) of constant sum. According to this principle, mean pooling max pooling and backpropagation is different.

MAX POOLING:

If MAX is, assuming that the maximum forward position before we are recorded propagation upper left, lower right, upper right, lower left, then the conversion matrix is: 
Write pictures described here 

for example:

max pooling should satisfy the same principles and gradients , max pooling before the communication is transmitted after a layer of the patch to the maximum value, and the value of other pixels directly discard. Then the back-propagation is the layer directly to a gradient of a pixel before, and the other pixels are not accepted gradient, i.e. is 0 . Max pooling operation and therefore the operation is different in that the mean pooling required in the end which is the maximum value of a pixel, i.e. max id lower reservoir when the recording operation , the maximum value of this variable is the location where the records because the reverse propagation to use, then the process of transmission to the front and reverse spread on the assumption as follows:

 

Average POOLING
If Average, then averaged, converted matrix: 
Write pictures described here 

for example:

before mean pooling is to spread to a value in the patch do Pooling obtaining the average , then the process is back-propagated to the gradient of an element divided into n parts assigned to the front layer, thus ensuring pooled before and after the gradient (residual) sum remains unchanged

 

This matrix is upper error matrix through the matrix to upsample after, then a layer of error before the error level is derived as an equation: 
Write pictures described here 
reverse derivation formulas and errors common network is very similar: 
Write pictures described here 
It can be seen that only the first different .

 

4, the known convolutional error layer, a hidden error in the derivation of the reverse layer,

Derivation link: https://blog.csdn.net/legend_hua/article/details/81590979

Formula is as follows: 
Write pictures described here 
We look at a reverse common network error derived formula: 
Write pictures described here 
see the difference that a transpose operation weight w of the next layer, into a rotation operation by 180 degrees, i.e. turned upside down once, about flip once again, this is in fact the meaning of the term "convolution" (we can simply understood as mathematical trick), refer to the diagram, Q is a layer of error around 0 up to facilitate the calculation, W is 180 degrees convolution kernel flipped, P W and Q are doing convolution results: 
Write pictures described here

5, the error is known convolution layer, the layer deriving W, b gradient

After the above steps, we have calculated error per layer, then: 
A) For a full connection layer, this layer may find W, b gradient according to the conventional network back-propagation algorithm. 
b) For the cell layer, it does not W, b, do not seek W, b gradient. 
c) Only the convolution W layer, b necessary to calculate, look w: 
Write pictures described here 
then compare general formula evaluation gradient w network found that the difference, do operate rotated 180 degrees on the output of the previous layer: 
Write pictures described here 
For b it is somewhat special because in the CNN, the error δ is a three-dimensional tensor, and b is a vector only, and not as direct as equal error δ common network. The usual practice is for each entry of the error δ of sub-matrices are summed to obtain an error vector, b is the gradient of: 

Write pictures described here

 

Guess you like

Origin www.cnblogs.com/WSX1994/p/11230121.html