Artificial intelligence and machine learning - gradient descent

Artificial intelligence and machine learning - gradient descent

A gradient descent method outlined

Gradient descent method (Gradient descent) is a first-order optimization algorithm. To find the local minimum value of a function using a gradient descent method, it must correspond to the opposite direction gradient (approximately or gradient) of a predetermined function to the current point on long distance point iterative search steps.

Second, the gradient descent understand intuitively

People down, for example, to reach the lowest point, the following steps:

The first step, identify their present position

The second step, with respect to the location to find the fastest decline in terms of direction

The third step, a small step in the direction to go to find the second step, and then in the end a new location, location at this time is lower than the original

The fourth step, the current location also made it clear that the first step back

The fifth step, the lowest point in the end after the stop

Based on the above steps, you can find the lowest point, the following figure as an example

Here Insert Picture Description

Third, the gradient of decrease of Function Solution

To y (x) function, for example, having the function of n + 1 arguments:
Here Insert Picture Description
first need to set the initial lowered position, can be arbitrarily set:

Next, the partial derivatives needed for each argument:
Here Insert Picture Description
The initial setting values into partial derivative of the formula, the formula for the next iteration Here Insert Picture Description
α in the above formula is the learning rate, also called the steps

IV. Examples

1, Python and Excel using a gradient descent method are solved minimum value at the minimum point

(1), Excel using a gradient descent method for solving a minimum value at the minimum point

Need to set the start position and the lowered value of the learning rate:
Here Insert Picture Description
to calculate the partial derivative of the argument by the starting position of the deflector and the product of the study and the final value of the function: Here Insert Picture Description
already known information of the first row, it information of the second row can be calculated, and then iterative extremum according to information of the second row, since the calculation formula of calculation formula of the first row and the second row are different, but the same as the second row behind the formulas and therefore we need to have requested information on the second line: Here Insert Picture Description
later iterations copy function can be used to excel in, when to stop the iterative function value converges, because the learning rate is made relatively small, and therefore the number of iterations will be more.
Here Insert Picture Description

The minimum value out of the final iteration -8, the minimum point (4,2), the value of x1 multiple iterations should be equal to 4

(2), Python minimum value at the minimum point is calculated and plotted image

import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import math
from mpl_toolkits.mplot3d import Axes3D
import warnings
# 解决中文显示问题
mpl.rcParams['font.sans-serif'] = [u'SimHei']
mpl.rcParams['axes.unicode_minus'] = False
%matplotlib inline
def f2(x1,x2):
    return x1 ** 2+2*x2 ** 2-4*x1-2*x1*x2
X1 = np.arange(-4,4,0.2)
X2 = np.arange(-4,4,0.2)
X1, X2 = np.meshgrid(X1, X2) # 生成xv、yv,将X1、X2变成n*m的矩阵,方便后面绘图
Y = np.array(list(map(lambda t : f2(t[0],t[1]),zip(X1.flatten(),X2.flatten()))))
Y.shape = X1.shape # 1600的Y图还原成原来的(40,40)
 
%matplotlib inline
#作图
fig = plt.figure(facecolor='w')
ax = Axes3D(fig)
ax.plot_surface(X1,X2,Y,rstride=1,cstride=1,cmap=plt.cm.jet)
ax.set_title(u'$ y = x1^2+2x2^2-4x1-2x1x2  $')
plt.show()

Here Insert Picture Description

# 二维原始图像
def f2(x1, x2):
    return x1 ** 2+2*x2 ** 2-4*x1-2*x1*x2
## 偏函数
def hx1(x1, x2):
    return 2* x1-4-2*x2
def hx2(x1, x2):
    return 4*x2-2*x1 
x1 = 1
x2 = 1
alpha = 0.0011
#保存梯度下降经过的点
GD_X1 = [x1]
GD_X2 = [x2]
GD_Y = [f2(x1,x2)]
# 定义y的变化量和迭代次数
y_change = f2(x1,x2)
iter_num = 0
 
while(y_change < 1e-10 and iter_num < 10000) :   #此处可以设置迭代的次数以及y的变化量小于多少时停止迭代
    tmp_x1 = x1 - alpha * hx1(x1,x2)
    tmp_x2 = x2 - alpha * hx2(x1,x2)
    tmp_y = f2(tmp_x1,tmp_x2)
    f_change = np.absolute(tmp_y - f2(x1,x2))
    x1 = tmp_x1
    x2 = tmp_x2
    GD_X1.append(x1)
    GD_X2.append(x2)
    GD_Y.append(tmp_y)
    iter_num += 1
print(u"最终结果为:(%.5f, %.5f, %.5f)" % (x1, x2, f2(x1,x2)))
print(u"迭代过程中X的取值,迭代次数:%d" % iter_num)
print(GD_X1)
 
# 作图
fig = plt.figure(facecolor='w',figsize=(20,18))
ax = Axes3D(fig)
ax.plot_surface(X1,X2,Y,rstride=1,cstride=1,cmap=plt.cm.jet)
ax.plot(GD_X1,GD_X2,GD_Y,'ko-')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
ax.set_title(u'函数;\n学习率:%.3f; 最终解:(%.3f, %.3f, %.3f);迭代次数:%d' % (alpha, x1, x2, f2(x1,x2), iter_num),fontsize=30)
plt.show()
最终结果为:(3.99942, 1.99964, -8.00000)
迭代过程中X的取值,迭代次数:10000
[1, 1.0044, 1.00878548, 1.013156514536, 1.0175131777223392, 1.021855543254118, 1.0261836844096073, 1.0304976740526408.........]

Here Insert Picture Description

2, gradient descent method for solving multiple linear regression equation with the least squares method to obtain parameters and parameters were compared

(1), the least square method

①, Excel the results

Excel drawn image of
Here Insert Picture Description
Excel calculated equation
Here Insert Picture Description

②, Python calculated results with the drawn image
#导入需要的库
from sklearn import linear_model        #表示,可以调用sklearn中的linear_model模块进行线性回归。
import pandas as pd
import seaborn as sns  
import matplotlib.pyplot as plt
%matplotlib inline
#读入本地数据
data=pd.read_excel('media/作业2.xlsx')
x=data[['店铺的面积(坪)','距离最近的车站(m)']]
x

Here Insert Picture Description

y=data['月营业额(万日元)']
y
梦之丘总店     469
寺井站大厦店    366
曾根店       371
桥本大街店     208
桔梗町店      246
邮政局前店     297
水道町站前店    363
六条站大厦店    436
若叶川店      198
美里店       364
Name: 月营业额(万日元), dtype: int64
model=linear_model.LinearRegression()
model.fit(x,y)
#调用sklearn库求出常数与未知数
a1=model.coef_[0]
a2=model.coef_[1]
b=model.intercept_
print("第一个未知数a1="+str(a1)+";第二个未知数a2="+str(a2)+";常数b="+str(b))
第一个未知数a1=41.513478256438496;第二个未知数a2=-0.3408826856636194;常数b=65.32391638894819
#打印回归方程
print("多元线性回归y="+str(a1)+"*x1"+str(a2)+"*x2+"+str(b))
多元线性回归y=41.513478256438496*x1-0.3408826856636194*x2+65.32391638894819
#支持中文
from pylab import mpl
mpl.rcParams['font.sans-serif']=['FangSong']
mpl.rcParams['axes.unicode_minus']=False

sns.pairplot(data, x_vars=['店铺的面积(坪)','距离最近的车站(m)'], y_vars='月营业额(万日元)',height=8,size=4, aspect=1.5,kind = 'reg')
plt.show()

The relationship between the distance and the shop area and the monthly turnover from the station, we can see, inversely proportional to the distance from the station, with shops and proportional to the area.Here Insert Picture Description

(2), a gradient descent method for solving multiple linear regression equation

import numpy as np
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
data=np.genfromtxt('media/车站问题.csv',delimiter=',')
x_data=data[:,:-1]
y_data=data[:,2]
#定义学习率、斜率、截据
#设方程为y=theta1x1+theta2x2+theta0
lr=0.00001
theta0=0
theta1=0
theta2=0
#定义最大迭代次数,因为梯度下降法是在不断迭代更新k与b
epochs=10000
#定义最小二乘法函数-损失函数(代价函数)
def compute_error(theta0,theta1,theta2,x_data,y_data):
    totalerror=0
    for i in range(0,len(x_data)):#定义一共有多少样本点
        totalerror=totalerror+(y_data[i]-(theta1*x_data[i,0]+theta2*x_data[i,1]+theta0))**2
    return totalerror/float(len(x_data))/2
#梯度下降算法求解参数
def gradient_descent_runner(x_data,y_data,theta0,theta1,theta2,lr,epochs):
    m=len(x_data)
    for i in range(epochs):
        theta0_grad=0
        theta1_grad=0
        theta2_grad=0
        for j in range(0,m):
            theta0_grad-=(1/m)*(-(theta1*x_data[j,0]+theta2*x_data[j,1]+theta2)+y_data[j])
            theta1_grad-=(1/m)*x_data[j,0]*(-(theta1*x_data[j,0]+theta2*x_data[j,1]+theta0)+y_data[j])
            theta2_grad-=(1/m)*x_data[j,1]*(-(theta1*x_data[j,0]+theta2*x_data[j,1]+theta0)+y_data[j])
        theta0=theta0-lr*theta0_grad
        theta1=theta1-lr*theta1_grad
        theta2=theta2-lr*theta2_grad
    return theta0,theta1,theta2
#进行迭代求解
theta0,theta1,theta2=gradient_descent_runner(x_data,y_data,theta0,theta1,theta2,lr,epochs)
print("多元线性回归方程为:y=",theta1,"X1+",theta2,"X2+",theta0)
ax=plt.figure().add_subplot(111,projection='3d')
ax.scatter(x_data[:,0],x_data[:,1],y_data,c='r',marker='o')
x0=x_data[:,0]
x1=x_data[:,1]
#生成网格矩阵
x0,x1=np.meshgrid(x0,x1)
z=theta0+theta1*x0+theta2*x1
#画3d图
ax.plot_surface(x0,x1,z)
ax.set_xlabel('店铺的面积',fontsize=20)
ax.set_ylabel('距离最近的车站',fontsize=20)
ax.set_zlabel("月营业额",fontsize=20)
plt.show()

多元线性回归方程为:y= 45.0533119768975 X1+ -0.19626929358281256 X2+ 5.3774162274868

Here Insert Picture Description

(3) Comparative Analysis

By comparison with the results of the least squares analysis, we found calculated gradient descent error is relatively large, relatively low accuracy. The least squares method, whether by means of sklearn library or borrow a formula to derive the results obtained are more accurate!

Published an original article · won praise 4 · Views 121

Guess you like

Origin blog.csdn.net/miss_bear/article/details/105313604