Gradient descent method for solving linear regression

Gradient descent

Gradient descent method (English: Gradient descent) is a first-order optimization algorithm, also commonly known as the steepest descent method. To find the local minimum value of a function using a gradient descent method, it must correspond to the opposite direction gradient (approximately or gradient) of a predetermined function to the current point on long distance point iterative search steps. If instead the search, the function will be close to the local maximum points in the positive direction iterative gradient; This process is called gradient ascent method.

Gradient descent image interpretation

Now there is a valley, you want to reach the low end of the valley, this time you are at point A, then you can use this time to find the lowest point of the gradient descent. Every time you order your current direction as a reference. Select a steepest direction, walking towards the mountain falling down every time some distance away, repeat the steps, you can always reach the top.

## gradient descent algorithm theory

Introduce the principle:

### differential

In fact, the differential image can be seen as a function of the slope at some point. There univariate and multivariate differential differential

$\frac{d(x^2)}{x}=2x$
$\frac{\partial}{\partial x} (x^2y)=2xy$
$\frac{\partial}{\partial y}(x^2y)=x^2$

gradient

Is intended a gradient vector (vector) representing a function to obtain the maximum value of the derivative along the direction of direction at this point, i.e. the point at which the function along the direction (the direction of this gradient) changes fastest change the maximum rate (for gradient mode).

A gradient vector. A gradient of a point for each of the variables actually taking partial vectors composed of the guide.

$J(\Theta)=1+2\Theta_1-3\Theta_2+4\Theta_3$
$\Delta J(\Theta)= <\frac{\partial J}{\partial\Theta_1},\frac{\partial J}{\partial\Theta_2},\frac{\partial J}{\partial\Theta_3}> = <2,-3,4>$

Principia Mathematica gradient descent algorithm

$\Theta_1$=$\Theta_0$ -$\alpha \Delta J(\Theta)$

Explained formula: \ (\ Theta_0 \) represents the current location, \ (\ theta_1 \) represents a position, \ (\ Alpha \) represents a step, (J \) \ function is the current gradient. Step size minus sign indicates reverse, i.e., downhill.

In machine learning \ (\ alpha \) represents the learning rate or step size, we need to \ (\ alpha \) to control the distance of each step to go, neither too fast nor too slow.

Gradient descent Application Examples

Now we have a function of a single variable:

$J(\Theta)=\Theta^2$

For differentiated functions:

$J'(\Theta)=2\Theta$

Set \ (\ Theta_0. 1 = \) , learning rate \ (\ alpha = 0.4 \)

The gradient descent equation

$\Theta_1=\Theta_0-\alpha*J'(\Theta)$

We continue iteration:

$\Theta_0=1$
$\Theta_1=0.2$
$\Theta_2=0.04$
$\Theta_3=0.008$
$\Theta_4=0.0016$

After \ (4 \) iterations, the final result is close to the minimum value of the function.

Solving process to solve the same multi-variable functions and univariate.

Gradient descent solving linear regression

House prices and area (data in the following table)

No. area price
1 150 6450
2 200 7450
3 250 8450
4 300 9450
5 350 11450
6 400 15450
7 600 18450

使用梯度下降求解线性回归(求\(\Theta_0,\Theta_1\)

$h_\Theta(x)=\Theta_0+\Theta_1x$

我们的目的是使得我们的估计值和实际值相差最小,因此我们定义一个代价函数,这里我们使用均方误差代价函数:

$J(\Theta)=\frac{1}{2m}\sum_{i=1}^m(h_\Theta(x_i)-y_i)^2$

即:

$J(\Theta)=\frac{1}{2m}\sum_{i=1}^m(\Theta_0+\Theta_1x_i-y_i)^2$

而其中\(h_\Theta(x)=\Theta_0+\Theta_1x\)
让函数分别对\(\Theta_0,\Theta_1\)求偏导。

$\Delta J(\Theta)= <\frac{\partial J}{\partial \Theta_0}, \frac{\partial J}{\partial \Theta_1}>$

其中:

$\frac{\partial J}{\partial \Theta_0}=\frac{1}{m}\sum_{i=1}^m(h_\Theta(x_i)-y_i)$
$\frac{\partial J}{\partial \Theta_1}=\frac{1}{m}\sum_{i=1}^m(h_\Theta(x_i)-y_i)x_i$

接下来就是代码时间了

import math
m=7 #数据集大小
Theta0=300
Theta1=100
#初始坐标

alpha=0.000000001#学习率
area=[150,200,250,300,350,400,600];#数据集
price=[6450,7450,8450,9450,11450,15450,18450];
def gradientx(Theta0,Theta1):#对Theta0的偏导
    ans=0
    for i in range(0,7):
        ans=ans+Theta0+Theta1*area[i]-price[i]
    ans=ans/m
    return ans
def gradienty(Theta0,Theta1):#对Theta1的偏导
    ans=0
    for i in range(0,7):
        ans=ans+(Theta0+Theta1*area[i]-price[i])*area[i]
    ans=ans/m
    return ans


nowTheta0 = Theta0-alpha*gradientx(Theta0, Theta1)#下一个点的坐标
nowTheta1 = Theta1-alpha*gradienty(Theta0, Theta1)
#print(nowTheta0,nowTheta1)
while math.fabs(nowTheta1-Theta1)>0.000000001:#梯度下降
    nowa = nowTheta0-alpha*gradientx(nowTheta0,nowTheta1)
    nowb = nowTheta1-alpha*gradienty(nowTheta0, nowTheta1)
    nowTheta0=nowa
    nowTheta1=nowb
    nowa = Theta0-alpha*gradientx(Theta0, Theta1)
    nowb = Theta1-alpha*gradienty(Theta0, Theta1)
    Theta0=nowa
    Theta1=nowb
print(nowTheta0,nowTheta1 )
#299.85496413867725 32.638872688242515

绘图

import numpy as np
import matplotlib.pyplot as plt

from matplotlib import pyplot
area=[150,200,250,300,350,400,600]#数据集
price=[6450,7450,8450,9450,11450,15450,18450]
pyplot.scatter(area,price)
x=np.arange(100,700,100)
y=32.37648991481203*x+299.85496413867725
pyplot.plot(x,y)
pyplot.xlabel('area')
pyplot.ylabel('price')
pyplot.show()

结果:

我们可以看到梯度下降求解出的线性回归很好的与结果吻合了。

拟合过程(每次的\(\Theta_0\)\(\Theta_1\)):
\myplot.png

Guess you like

Origin www.cnblogs.com/codancer/p/12232294.html