Gradient descent algorithm for study notes

Gradient Descent (GD) is currently the most core and most widely used method for solving optimization problems in machine learning and deep learning. It is not a machine learning algorithm, but a search-based optimization method. Its role is to optimize the loss function of the original model, so as to find the optimal parameters, so that the value of the loss function is minimum. That is, use the known training set data to find the optimal parameters, so as to find the optimal fitting model. What is gradient descent?

1. Concept

The gradient is a vector, the same as the parameter dimension. Simply put, the derivative of a multivariate function is the gradient. Differentiate each variable separately and then separate them with commas. The gradient is enclosed in parentheses, indicating that the gradient is actually a vector. For example, the gradient of the linear regression loss function L is: f = ( L a , L b ) \triangledown f=(\frac{\partial{L}}{\partial{a}},\frac{\partial{L}}{\partial{b}})

Second, the calculation process

1. Steps:

①, Partial derivative of each parameter vector, get f \triangledown f ;
②, set the initial parameter vector, learning rate η and threshold threshold;
③, iteratively calculate the parameter vector f \triangledown f value, if f \triangledown f value is less than or equal to the threshold, and the parameter vector is the local optimal solution; otherwise, the next point parameter vector is calculated, and the formula is the previous point parameter vector -η * f \triangledown f for the next iteration.

2. The equation of one variable

Unary function: f ( x ) = 3 x 2 + 5 x f(x)=3x^2+5x
first step, find the derivative. f ( x ) = 6 x + 5 f^{'}(x)=6x+5
second step, initialization x 0 x_0 、η、threshold。 x 0 = 1 η = 0.1 t h r e s h o l d = 0.0001 x_0=1,η=0.1,threshold=0.0001
the third step, computing f ( x 0 ) f^{'}(x_0) , And compared with threshold.
The fourth step is the iterative process. As shown in the following table:
Insert picture description here

2.1 Simple code demo

首先,手工定义原函数和导函数。
def loss_function(x):
    return 3*(x**2)+5*x

def det_function(x):
    return 6*x+5
然后,定义梯度下降方法。
def get_GD(od_f=None,f=None,x_0=None,eta=0.001,threshold=0):
    x_all=[]
    od_f_all=[]
    det_f_all=[]
    count_n=0
    while True:
        count_n+=1
        y=od_f(x_0)
        #计算导数在x处的值
        det_f=f(x_0)     
        od_f_all.append(y)
        x_all.append(x_0)
        det_f_all.append(det_f)
        #计算下一个点的值
        x_0=x_0-eta*det_f
        #判断是否到达目的地
        if det_f<=threshold:
            break
            
    return x_all,od_f_all,det_f_all,count_n
最后,设置x_0=1,eta=0.1,threshold=0.0001。查看图形。

Insert picture description here
Complete code block

Reference article: https://mp.weixin.qq.com/s/44p8anqiiQV6XYGqH5u-Ug
https://mp.weixin.qq.com/s/nI9IBa4ccfg0xqyn0tbRPA
https://mp.weixin.qq.com/s/8gStYSSBvkXeuaX6Pp9qiQ
https://mp.weixin.qq.com/s/OUslRwKGpS29gncsiyAPyg

Published 12 original articles · Like9 · Visitors 20,000+

Guess you like

Origin blog.csdn.net/sun91019718/article/details/105084056