利用批量梯度下降和正规方程求解线性回归参数(Python实现)

说明：本文使用的工具为Python3+Jupyter Notebook。

利用批量梯度下降

先导入要用到的各种包：

%matplotlib notebook
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

构造并查看数据：

# 构造数据集：
data = {'Ones':[1,1,1,1,1,1,1,1,1],'x_train':[3,5,6,8,12,13,15,20,23],
'y_train':[8,15,23,28,37,49,55,64,85]}
train_data = pd.DataFrame(data)

# 查看具体数据(train_data中的'Ones'列是为了便于矩阵计算而构造的辅助列)
train_data

train_data数据结构如下图所示：

画出数据的散点图：

# 绘制数据的散点图：

fig,axes = plt.subplots()
train_data.plot(kind='scatter',x='x_train',y='y_train',ax=axes)
fig.savefig('p1.png')

上述代码绘制的散点图如下所示：

在进行梯度下降前对数据做预处理：

cols = train_data.shape[1]
X = train_data.iloc[:,0:cols-1] # X是train_data中不包括最后一列的所有行
y = train_data.iloc[:,cols-1:cols] # y即最后一列

# 将X和y转化成矩阵形式：
X = np.matrix(X.values)
y = np.matrix(y.values)
theta = np.matrix(np.array([0,0])) # 初始化theta参数

运用批量梯度下降算法求解线性回归参数：

# 定义代价函数：
def computeCost(X,y,theta):
    inner = np.power(((X*theta.T)-y),2)
    return np.sum(inner)/(2 * len(X))

# 定义梯度下降函数：
def batch_gradient_descent(X,y,theta,alpha,iters):
    temp = np.matrix(np.zeros(theta.shape))
    parameters = int(theta.shape[1])
    cost = np.zeros(iters)
    
    for i in range(iters):
        error = (X * theta.T) - y
        
        for j in range(parameters):
            term = np.multiply(error, X[:,j])
            temp[0,j] = theta[0,j] - ((alpha / len(X)) * np.sum(term))
            
        theta = temp
        cost[i] = computeCost(X, y, theta)
        
    return theta, cost  # cost指根据每次迭代更新后的theta参数计算出的代价函数的具体值

# 初始化学习率和迭代次数：
alpha = 0.001
iters = 100

# 调用梯度下降函数计算参数theta的值：
g, cost = batch_gradient_descent(X, y, theta, alpha, iters)

# print(g)    theta参数的值为[[0.18629876 3.48668984]]

# 根据已计算出的theat参数对原始数据进行线性拟合：
x = np.linspace(train_data.x_train.min(), train_data.x_train.max(), 100)
f = g[0, 0] + (g[0, 1] * x)

fig, axes = plt.subplots()
axes.plot(x, f, 'r', label='Prediction')  # 线性拟合图
axes.scatter(train_data.x_train, train_data.y_train, label='Traning Data') # 原始数据散点图
axes.legend(loc='best')
axes.set_xlabel('x')
axes.set_ylabel('y')
axes.set_title('Predicted x vs. y')
fig.savefig('p2.png')

上述代码绘制的图如下所示：

此外，我们还可以绘制代价函数的具体值与迭代次数之间的关系图像：

fig, axes = plt.subplots()
axes.plot(np.arange(iters), cost, 'r')
axes.set_xlabel('Iterations')
axes.set_ylabel('Cost')
axes.set_title('Error vs. Training Epoch')
fig.savefig('p3.png')

具体图像如下所示：

从上图中，我们可以看到当迭代次数为20时，代价函数的值就已经接近收敛了。前面利用梯度下降算法求解线性回归参数，我们设置的最大迭代次数为100，此时计算出的theta参数值为：matrix([[0.18629876, 3.48668984]])

利用正规方程

# 定义正规方程函数：

def normalEqn(X, y):
    theta = np.linalg.inv(X.T@X)@X.T@y  # X.T@X等价于X.T.dot(X)
    return theta

# 调用正规方程函数计算线性回归参数：

final_theta=normalEqn(X, y) 
# final_theta的具体值为：
# matrix([[-1.60933806],[ 3.60460993]])

利用正规方程求出的线性回归参数可以说是精确的解。由前文可知，当最大迭代次数设为100次时，批量梯度下降算法求出的线性回归参数值为matrix([[0.18629876, 3.48668984]])。假如想要让批量梯度下降算法求出的参数值更加接近正规方程的解，可以增加批量梯度下降算法中的最大迭代次数。经测试得到：当把批量梯度下降算法中的最大迭代次数设为50000时，求得的参数值为matrix([[-1.60932272, 3.60460892]])，可以看到已经与正规方程的解特别接近。

PS1：关于批量梯度下降和正规方程详细的理论推导，可以参考：https://blog.csdn.net/qq_41080850/article/details/85292769

PS2：本文为博主原创文章，转载请注明出处。