The second use of neural networks to build Numpy: gradient descent method is implemented

 

  Little friends ~ ~ okay you build neural networks Numpy, we have come to the second phase. The first phase bacteria digest teach you how to build a simple neural network with Numpy, completed the feedforward part.

  In this issue, we bring gradient descent relevant knowledge, and on a same gradient descent with Numpy still achieve. Before the beginning of the code, the first to popularize the knowledge of gradient descent point of it.

  Gradient descent: iterative solver model parameter values

  The first phase of the article mentioned, the most simple neural network consists of three elements, the input layer, hidden layer and output layer. On which the working mechanism into a completely analogous membership function: Y = W * X + b. I.e., input data X, to obtain output Y.

  How to assess the quality of a function, that is how professional fitness? The easiest way is to measure the gap between the true value and output value, the stronger the ability to express on behalf of the small gap between the two about the function.

  This gap measure also called loss function. Obviously, the smaller the loss function value, the stronger the original function skills.

  What value of parameter then the function has a minimum value? Usually local minimum can be obtained derivative (taken at the extreme point). The gradient descent method is required to have a minimum value of the function parameter.

  Gradient descent mathematical expression

  For example, linear regression, expressed as a function of assumed hθ (x1, x2 ... xn) = θ + θ1x1 + .. + θnxn, where wi (i = 0,1,2 ... n) of model parameters, xi (i = 0 , 1,2 ... n) of the n characteristic values ​​of each sample. This representation can be simplified, we add a feature x = 1, so that h (xo, x1 ,. ... xn) = θx + θ1x1 + .. + θnxn. Also linear regression function corresponding to the above assumptions, the loss function (here plus 1 / 2m before the loss function, mainly to allow the correction calculation formula results SSE more beautiful, in fact, MSE or SSE loss function may take both a fixed value for a given sample in terms of a difference of only):

  Initialization algorithm parameters: mainly initialize θ, θ1 .., θn, we tend to all initialized to 0, the step size is initialized to 1. Then be optimized when tuning.

  Expression formulas θi gradient as follows:

  (Learning rate) is multiplied by the step size of the gradient of the loss function, the current position obtained from the lowered, i.e.:

  Matrix gradient descent method is described

  Corresponding to the above linear function, which is a matrix representation:

  Loss function expression is:

  Wherein Y is a sample output vector.

  Gradient formula is expressed:

  Examples of linear regression is used to describe a particular algorithmic process. Partial derivative of the loss function for the vector is calculated as follows:

  Iteration: Zhengzhou gynecological hospital Which is good: https: //yyk.familydoctor.com.cn/21521/

  Two matrix derivation formula:

  Gradient descent implemented in Python

  Introducing two packages necessary.

  Definition of standardized functions, not too big or too small values ​​affect the solution.

  Gradient descent function defines:

  Which represents the input of data dataset, alpha is the learning rate, maxcycles is the largest number of iterations.

  I.e., returns a weight that is evaluated. np.zeros is the initialization function. grad obtaining the matrix is ​​based on gradient descent equation solving.

Guess you like

Origin www.cnblogs.com/sushine1/p/10950048.html