Numerical differentiation

        Like vector such as a summary of all the variables of the partial derivative formed called gradient (gradient), the gradient may be achieved such as the following, the method used here is a numerical Differential gradient.

DEF numerical_gradient (F, x): 
    H =-1E. 4 # from 0.0001 
    Grad = np.zeros_like (x) # generates the same x-shaped array 
    for IDX in Range (x.size): 
        tmp_val = x [IDX] 

        # F ( x h) computing + 
        X [IDX] = tmp_val + H 
        fxh1 = F (X) 

        # F (XH) computing 
        X [IDX] = tmp_val - H 
        fxh2 = F (X) 

        Grad [IDX] = (fxh1 - fxh2 ) / (2 * H) 
        X [IDX] = tmp_val # reduction value 

    return city                        

Gradient 

        Although the direction of the gradient does not necessarily point to a minimum value, but can be a direction along which minimize the value of the function. Thus, the task to find the minimum of the function (or a small value as possible) in a position to gradient direction information as a clue, the decision to move forward.

      η represents the update amount in the learning of the neural network, referred to as the learning rate (learning rate). Learning rate decision in a study, you should learn how much and to what extent update parameters.

     Python is achieved by gradient descent:

def gradient_descent(f, init_x, lr=0.01, step_num=100):
    x = init_x

    for i in range(step_num):
        grad = numerical_gradient(f, x)
        x -= lr * grad

    return x    

      f is a function to be optimized, init_x is an initial value, lr is the learning rate learningrate, step_num number of repetitions of the gradient method. numerical_gradient (f, x) will find the gradient function, learning rate is multiplied by the value obtained by the gradient operation is updated, the number of repetitions is specified by step_num.

    Such as learning rate parameter called super parameters.

Neural network gradient

   Neural network learning also requires gradient. Herein refers to said gradient is a gradient of the loss function on the right weight parameter

     Elements from the respective elements of the partial derivatives of W on configuration.

 

Guess you like

Origin www.cnblogs.com/latencytime/p/11067422.html