Newton's method and gradient descent algorithm theory

A Newton method works:

(1) Definition:

Newton's law generally refers to the Newton iterative method. Newton iterative method (Newton's method), also known as the Newton - Raphson (Raphson) method (Newton-Raphson method), it is proposed a method of Newton in the 17th century in the field of real numbers and complex field approximation to solve the equation.

(2) Background:

There is no majority equation root formula, and therefore seek accurate root very difficult, if not impossible, to find so approximate roots of the equation is especially important. The method of using the function f (x) in front of several Taylor series to find the equation f (x) = 0 roots. Newton iterative method is an important method for root equation, the maximum advantage is a single quadratic convergence near the (x) = 0 ** ** F in the equation, and the process may also be used to find the Roots of equations, complex root, linear convergence at this time, but may become superlinear convergence by some means. In addition, this method is widely used in computer programming.

(3) official:

1. A membership function where:

In order to allow everyone a better understanding of the principle of derivation, first consider the case of functions of one variable. According to Taylor functions of one variable expansion formula, we do the objective function at the point x0 Taylor expansion, there are: img
If you ignore twice or more items, then:
img
Now we are at the point x0, to be based on it, to find the derivative 0 point, i.e., the derivative is zero. While the derivative of both sides of the above equation, and so derivative is 0, the following equation can be obtained:
img
can be solved to give:
img
This gives us the position of the next point to come x1. Next This process is repeated until the derivative is point 0, to thereby obtain the iterative formulas Newton's method: img
given an initial iteration point x0, repeated iteration of the above formula, until a derivative is the point 0 or the maximum number of iterations .

2. The multi-function case:

Here is extended to the case of multi-function, if the reader gradient, Hessian concept is unclear, please see calculus textbook, on or before reading SIGAI optimized number of public posts. According to Taylor multi-function expansion formula, we do the objective function at the point x0 Taylor expansion, are:
img
Ignore secondary and above items, and at the same time seeking gradient on both sides of the equation, get the derivative function (gradient vector) is:
img
where It is the Hessian matrix, in the back of our written H. Order function gradient 0, then:
img
It is the solution of a system of linear equations. If the gradient vector abbreviated as g, the above formula can be abbreviated as follows:
img
from the initial point x0, the function is repeatedly calculated Hessian matrix and gradient vector at, followed by iteration of the following formula:
img
eventually reach a function of the stagnation point . Which is called Newton's direction. Iteration termination condition is close to 0 gradient mode, or the function value falls below a specified threshold.

(4) python Example

1. A membership function:

code show as below:

from sympy import *
# step为迭代步数,x0为初始位置,obj为要求极值的函数
def newtons(step, x0, obj):
    i = 1 # 记录迭代次数的变量
    x0 = float(x0) # 浮点数计算更快
    obj_deri = diff(obj, x) # 定义一阶导数,对应上述公式
    obj_sec_deri = diff(obj, x, 2) # 定义二阶导数,对应上述公式
    while i <= step:
        if i == 1:
            # 第一次迭代的更新公式
            xnew = x0 - (obj_deri.subs(x, x0)/obj_sec_deri.subs(x, x0))
            print('迭代第%d次:%.5f' %(i, xnew))
            i = i + 1
        else:
            #后续迭代的更新公式
            xnew = xnew - (obj_deri.subs(x, xnew)/obj_sec_deri.subs(x, xnew))
            print('迭代第%d次:%.5f' % (i, xnew))
            i = i + 1
    return xnew
x = symbols("x") # x为字符变量
result = newtons(50, 10, x**6+x)
print('最佳迭代的位置:%.5f' %result)

The results are shown:
[Picture dump outside the chain fails, the source station may have a security chain mechanism, it is recommended to save the pictures uploaded directly down (img-UvjjDKkX-1586158727496) (. \ Image-20200406153627471.png)]
[Picture dump outside the chain fails, the source station may have a security chain mechanism, it is recommended to save the pictures uploaded directly down (img-JQC0Q3ie-1586158727497) (. \ Image-20200406153608593.png)]

2. binary function:

code show as below:

"""
    用梯度法求二次函数f(x1,x2)=x1^2+2*x2^2-4*x1-2*x1*x2的极小值,极小点、
"""
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
 
def Fun(x1,x2):#原函数
    return x1*x1+2*x2*x2-4*x1-2*x1*x2
 
def PxFun(x1,x2):#偏x导
    return 2*x1-2*x2-4
 
def PyFun(x1,x2):#偏y导
    return 4*x2-2*x1
 
#初始化
i=0     #迭代次数
fig=plt.figure()#figure对象
ax=Axes3D(fig)#Axes3D对象
X1,X2=np.mgrid[-2:2:40j,-2:2:40j]#取样并作满射联合
Z=Fun(X1,X2)#取样点Z坐标打表
ax.plot_surface(X1,X2,Z,rstride=1,cstride=1,cmap="rainbow")
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_zlabel('z')
 
#梯度下降
step=0.01   #下降系数
x1=0
x2=0#初始选取一个点
tag_x1=[x1]
tag_x2=[x2]
tag_z=[Fun(x1,x2)]#三个坐标分别打入表中,该表用于绘制点
new_x1=x1
new_x2=x2
Over=False
while Over==False:
    new_x1-=step*PxFun(x1,x2)
    new_x2-=step*PyFun(x1,x2)#分别作梯度下降
    if Fun(x1,x2)-Fun(new_x1,new_x2)<7e-9:#精度
        Over=True
    x1=new_x1
    x2=new_x2#更新旧点
    tag_x1.append(x1)
    tag_x2.append(x2)
    tag_z.append(Fun(x1,x2))#新点三个坐标打入表中
    i=i+1
 
#绘制点/输出坐标
ax.plot(tag_x1,tag_x2,tag_z,'r.')
plt.title('(x1,x2)~('+str(x1)+","+str(x2)+')')
plt.show()
print("迭代次数:",i)

The results are as follows:
[Picture dump outside the chain fails, the source station may have a security chain mechanism, it is recommended to save the pictures uploaded directly down (img-mSkZqWNe-1586158727497) (. \ Image-20200406153721062.png)]

Second gradient descent algorithm principle:

(1) gradient three elements:

Three elements of thought gradient method: the starting point, the direction of decline, down steps.

(2) analogy:

Here, we make a vivid analogy, if the analogy that moves the force, then complete the three elements that step (to go much), direction, starting point, so vivid metaphor, let us resolve an insight into the gradient problem , the starting point is very important, the focus is initialized to consider, but the direction is the key step. In fact various different gradients is that these two points are different!

Gradient direction is:
img
, [Delta] is a constant step size, then it will find, if large gradient at the time, relatively far from the optimal solution, W is updated faster; however, when the gradient to the smaller, i.e. closer to the optimal solution when updating W actually have maintained the same as the original rate, this will lead to excessive W is easy to update but far from the optimal solution, and then appear in the vicinity of the optimal solution oscillates back and forth. So, since when large gradient away from the optimal solution, while small gradient near optimal solution, we concessions long as this rhythm, so I we used λ | W | instead of Δ, finally got familiar formula:
img
so this time λ is slow and steep as the slope of change, though it is a constant.

(3) official:

Calculated as follows:
img

III. Summary

Least squares calculation amount is too large, and the inverse matrix is ​​quite time-consuming numerical instability situation there will inverse matrix, by contrast gradient descent can be seen as a simpler method of least squares solution of the final step equation method, gradient descent method although there are some disadvantages, but the amount is not calculated, a greater amount of data in the gradient descent method selected at the time a little better.

References:
[. 1] https://blog.csdn.net/sigai_csdn/article/details/80678812.

[2]. https://baijiahao.baidu.com/s?id=1613121229156499765&wfr=spider&for=pc

Released two original articles · won praise 0 · Views 18

Guess you like

Origin blog.csdn.net/lk1252793766/article/details/105344940