Deep learning - Backpropagation

  Finally understand the meaning of back-propagation.
  The core concept is to be understood that a derivative is constructed by a forward strand to then reverse numerical derivative chain.
  Why ask derivative values? Because the required minimum. What are the minimum requirements?
  Seeking the minimum of the loss function may be numerically performed derivation (reciprocal formula), it can be counter-propagating manner via derivation.
  Why ask losses extreme value function?
  Because to be inferred through the extremes W value loss function. In fact, our ultimate goal is to require the highest quality of W, the best is to achieve minimum loss function, in order to find the smallest, so we ask derivative loss function.
  But one thing to emphasize: the derivative must be a derivative of a set of data, or the derivative of a certain point of space, leaving the number does not make sense guide specific point of view; it is solving the derivative is actually true of a sample (a little space) to derivative; because the derivative of each point is pointing to fastest spin-down place, so if parameters in an iterative process each time to each point of normal (but what is normal, contours is how going on), the optimal solution is to take the overall direction. However, one thing to be noted that, in order to find loss function, but not reverse the loss reverse derivation function (equation) itself, but from the neural network hidden layer to the loss layer.
  Here is the gradient of the code:

. 1   DEF gradient (Self, X, T):
 2          # build forward strand, itself, we have to loss of function of the derivative, so here to look forward chain construct 
. 3          self.loss (X, T)
 . 4          # output layer (SoftMax) derivative 
. 5          DOUT =. 1
 . 6          DOUT = self.lastlayer.backward (DOUT)
 . 7          # hidden layer derivative 
. 8          layers = List (self.layers.values ())
 . 9          layers.reverse ()
 10          for layer in layers :
 . 11              DOUT = layer.backward (DOUT)
 12 is          GrADS = {}
 13 is          for IDXin range(1, self.hidden_layer_size + 2):
14             grads["W" + str(idx)] = self.layers["Affine" + str(idx)].dW + self.weight_decay_lambda * self.layers["Affine" + str(idx)].W
15             grads["b" + str(idx)] = self.layers["Affine" + str(idx)].db
16 
17         return grads

  Note here that the first call loss function is designed to build a positive first loss function path, the path including each layer after layer of x (including Affine layer, Relu layer, etc.), and finally the arrival of softmaxloss layer, the layer which is cross_entropy function; note : reverse function is not cross_entropy reverse, but from softmaxloss to relu to Affine, the reverse between this layer, after you pass a little space (x) forward completed, the reverse can go again in order to come out derivative cross_entropy of.
The following is included here to achieve loss function:

1     def loss(self, x, t):
2         y = self.predict(x)
3         weigh_decay = 0
4         # 注意这里+2,是因为还要把lastLayer也给加上
5         for idx in range(1, self.hidden_layer_size+2):
6             W = self.params["W" + str(idx)]
7             weigh_decay += 0.5 * self.weight_decay_lambda * np.sum(W ** 2)
8 
9         return self.lastlayer.forward(y, t) + weigh_decay

   The last question, why have Relu layer it? Because if Affine layer are then trained by a linear function, but many scenes are curved, non-linear distribution, this time on the need to increase the neural network learning about the non-linear layer, to achieve fit for the data.

 

Guess you like

Origin www.cnblogs.com/xiashiwendao/p/10992028.html