1、单变量线性回归

在本部分练习中，您将使用一个变量实现线性回归，以预测食品卡车的利润。假设你是一家连锁餐厅的首席执行官，正在考虑在不同的城市开设一家新分店。这个连锁店已经在不同的城市有了卡车，你可以从城市得到利润和人口的数据。

您希望使用这些数据来帮助您选择下一个要扩展到的城市。

文件ex1data1.txt包含线性回归问题的数据集。第一列是一个城市的人口第二列是那个城市的食品卡车的利润。利润为负数表示亏损。

1.1 Plotting the Data

在开始任何任务之前，通过可视化来理解数据通常是有用的。对于这个数据集，您可以使用散点图来可视化数据，因为它只有两个属性可以绘图(利润和人口)。(现实生活中你会遇到的许多其他问题都是多维的，不能用二维图来表示。)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

path =  'ex1data1.txt'
data = pd.read_csv(path, header=None, names=['Population', 'Profit'])

data.head()
Out[1]: 
   Population   Profit
0      6.1101  17.5920
1      5.5277   9.1302
2      8.5186  13.6620
3      7.0032  11.8540
4      5.8598   6.8233

data.describe()
Out[2]: 
       Population     Profit
count   97.000000  97.000000
mean     8.159800   5.839135
std      3.869884   5.510262
min      5.026900  -2.680700
25%      5.707700   1.986900
50%      6.589400   4.562300
75%      8.578100   7.046700
max     22.203000  24.147000

看下数据长什么样子

data.plot(kind='scatter', x='Population', y='Profit', figsize=(12,8))
plt.show()

1.2 Gradient Descent

在这一部分中,您将符合线性回归参数θ数据集使用梯度下降法。

1.2.1 Update Equations（更新方程）

线性回归的目标是使成本函数最小化

$J\left ( \theta \right ) = \frac{1}{2m} \sum \left ( h_{\theta }\left ( x^{\left ( i \right )} \right ) - y^{\left ( i \right )}\right )^{2}$

假设 $h_{\theta }\left ( x \right )$ 是由线性模型

$h_{\theta }\left ( x \right ) = \theta ^{T}x = \theta _{0} + \theta _{1}x_{1}$

回想一下,你的模型的参数是 $\theta _{j}$ 值。这些值将最小化成本调整J(θ)。一种方法是使用批量梯度下降算法。在批量梯度下降，每个迭代执行update

$\theta _{j} : \theta _{j} - \alpha \frac{1}{2m}\sum_{i=1}^{m}\left ( h_{\theta }\left ( x^{\left ( i \right )} \right ) - y^{\left ( i \right )} \right )x_{j}^{i}$ ，同时更新所有 $\theta _{j}$ 的j。

def computeCost(X, y, theta):
    inner = np.power(((X * theta.T) - y), 2)
    return np.sum(inner) / (2 * len(X))

每一步的梯度下降法,参数 $\theta _{j}$ 接近最优值,达到成本最低J(θ)。

Note:我们将每个示例作为一行存储在X矩阵中。考虑截距项( $\theta _{0}$ ),我们添加一个额外的第一列X作为 $\theta _{0}$ ，当作另一个“特性”。

在训练集中添加一列，以便我们可以使用向量化的解决方案来计算代价和梯度。

data.insert(0, 'Ones', 1)

data.head()
Out[3]: 
   Ones  Population   Profit
0     1      6.1101  17.5920
1     1      5.5277   9.1302
2     1      8.5186  13.6620
3     1      7.0032  11.8540
4     1      5.8598   6.8233

区分训练数据X和目标变量y

# set X (training data) and y (target variable)
cols = data.shape[1]
X = data.iloc[:,0:cols-1]#X是所有行，去掉最后一列
y = data.iloc[:,cols-1:cols]#X是所有行，最后一列

观察下 X (训练集) and y (目标变量)是否正确.

X.head()#head()是观察前5行
Out[4]: 
   Ones  Population
0     1      6.1101
1     1      5.5277
2     1      8.5186
3     1      7.0032
4     1      5.8598

y.head()
Out[5]: 
    Profit
0  17.5920
1   9.1302
2  13.6620
3  11.8540
4   6.8233

代价函数是应该是numpy矩阵，所以我们需要转换X和Y，然后才能使用它们。我们还需要初始化theta。

theta 是一个(1,2)矩阵

X = np.matrix(X.values)
y = np.matrix(y.values)
theta = np.matrix(np.array([0,0]))

theta
Out[6]: matrix([[0, 0]])

看下纬度

X.shape, theta.shape, y.shape
Out[7]: ((97, 2), (1, 2), (97, 1))

计算代价函数 (theta初始值为0).

computeCost(X, y, theta)
Out[8]: 32.072733877455676

1.2.2 Batch Gradient Descent（批量梯度下降）

$\theta _{j} : \theta _{j} - \alpha \frac{\partial }{\partial \theta _{j}}J\left ( \theta \right )$

def gradientDescent(X, y, theta, alpha, iters):
    temp = np.matrix(np.zeros(theta.shape))  #theta.shape 是一行两列。生成一个一行两列以0填充的矩阵
    parameters = int(theta.ravel().shape[1])  #ravel() 将多维数组降位一维。得到具体参数数量（共有多少列）
    cost = np.zeros(iters)
    
    for i in range(iters):
        error = (X * theta.T) - y
        
        for j in range(parameters):
            term = np.multiply(error, X[:,j])  #multiply 数组和矩阵对应位置相乘，输出与相乘数组/矩阵的大小一致
            temp[0,j] = theta[0,j] - ((alpha / len(X)) * np.sum(term))
            
        theta = temp
        cost[i] = computeCost(X, y, theta)
        
    return theta, cost

初始化一些附加变量 - 学习速率α和要执行的迭代次数。

alpha = 0.01  #学习率
iters = 1000  #迭代次数

现在让我们运行梯度下降算法来将我们的参数θ适合于训练集。

g, cost = gradientDescent(X, y, theta, alpha, iters)

g
out[9]: matrix([[-3.24140214,  1.1272942 ]])

最后，我们可以使用我们拟合的参数计算训练模型的代价函数（误差）。

computeCost(X, y, g)
out[10]: 4.5159555030789118

现在我们来绘制线性模型以及数据，直观地看出它的拟合。

x = np.linspace(data.Population.min(), data.Population.max(), 100)  ##linspace 在指定的间隔内返回均匀间隔的数字
f = g[0, 0] + (g[0, 1] * x)  #一次函数 f = a + bx

fig, ax = plt.subplots(figsize=(12,8))
ax.plot(x, f, 'r', label='Prediction')
ax.scatter(data.Population, data.Profit, label='Traning Data')
ax.legend(loc=2)
ax.set_xlabel('Population')
ax.set_ylabel('Profit')
ax.set_title('Predicted Profit vs. Population Size')
plt.show()

1.2.3 Visualize Cost Data（代价数据可视化）

由于梯度方程式函数也在每个训练迭代中输出一个代价的向量，所以我们也可以绘制。

请注意，代价总是降低 - 这是凸优化问题的一个例子。

fig, ax = plt.subplots(figsize=(12,8))
ax.plot(np.arange(iters), cost, 'r')
ax.set_xlabel('Iterations')
ax.set_ylabel('Cost')
ax.set_title('Error vs. Training Epoch')
plt.show()

可以看到从第二轮代价数据变换很大，接下来平稳了

2、多变量线性回归

在这一部分中，您将使用多个变量实现线性回归来预测房价。假设你正在出售你的房子，你想知道一个好的市场价格是多少。一种方法是首先收集最近售出的房屋的信息，并建立一个房价模型。

path =  'ex1data2.txt'
data2 = pd.read_csv(path, header=None, names=['Size', 'Bedrooms', 'Price'])
data2.head()

Out[11]: 
   Size  Bedrooms   Price
0  2104         3  399900
1  1600         3  329900
2  2400         3  369000
3  1416         2  232000
4  3000         4  539900

通过观察价值，注意房子的大小差不多是卧室数量的1000倍。当特征按数量级不同时，首先进行特征缩放可以使梯度下降更快地收敛。

这个对于pandas来说很简单

data2 = (data2 - data2.mean()) / data2.std()
data2.head()

Out[12]: 
       Size  Bedrooms     Price
0  0.130010 -0.223675  0.475747
1 -0.504190 -0.223675 -0.084074
2  0.502476 -0.223675  0.228626
3 -0.735723 -1.537767 -0.867025
4  1.257476  1.090417  1.595389

现在我们重复第1部分的预处理步骤，并对新数据集运行线性回归程序。

# add ones column
data2.insert(0, 'Ones', 1)

# set X (training data) and y (target variable)
cols = data2.shape[1]
X2 = data2.iloc[:,0:cols-1]
y2 = data2.iloc[:,cols-1:cols]

# convert to matrices and initialize theta
X2 = np.matrix(X2.values)
y2 = np.matrix(y2.values)
theta2 = np.matrix(np.array([0,0,0]))

# perform linear regression on the data set
g2, cost2 = gradientDescent(X2, y2, theta2, alpha, iters)

# get the cost (error) of the model
computeCost(X2, y2, g2)

Out[13]: 0.13070336960771892

我们也可以快速查看这一个的训练进程。

fig, ax = plt.subplots(figsize=(12,8))
ax.plot(np.arange(iters), cost2, 'r')
ax.set_xlabel('Iterations')
ax.set_ylabel('Cost')
ax.set_title('Error vs. Training Epoch')
plt.show()

我们也可以使用scikit-learn的线性回归函数，而不是从头开始实现这些算法。我们将scikit-learn的线性回归算法应用于第1部分的数据，并看看它的表现。

from sklearn import linear_model
model = linear_model.LinearRegression()
model.fit(X, y)

Out[14]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

scikit-learn model的预测表现

x = np.array(X[:, 1].A1)
f = model.predict(X).flatten()

fig, ax = plt.subplots(figsize=(12,8))
ax.plot(x, f, 'r', label='Prediction')
ax.scatter(data.Population, data.Profit, label='Traning Data')
ax.legend(loc=2)
ax.set_xlabel('Population')
ax.set_ylabel('Profit')
ax.set_title('Predicted Profit vs. Population Size')
plt.show()

3、Normal Equations（正规方程）

在课程视频中，我们学过线性回归的封闭解是：

$\theta = \left ( X^{T}X \right )^{-1} X^{T} \vec{y}$

使用这个公式不需要进行任何特征缩放，将在一次计算中得到一个精确的解:没有像梯度下降法那样的“直到收敛为止的循环”。虽然你不需要扩展功能,我们仍然需要对X矩阵添加一个列全为1的项，让X矩阵有一个截距项( $\theta _{0}$ )。

正规方程是通过求解下面的方程来找出使得代价函数最小的参数的： $\frac{\partial }{\partial \theta _{j}} J\left ( \theta _{j} \right ) = 0$ 。

假设我们的训练集特征矩阵为 X（包含了 $x_{0}$ =1）并且我们的训练集结果为向量 y，则利用正规方程解出向量 $\theta = \left ( X^{T}X \right )^{-1} X^{T}y$ 。

上标T代表矩阵转置，上标-1 代表矩阵的逆。设矩阵 $A = X^{T}X$ ，则： $\left ( X^{T}X \right )^{-1} = A^{-1}$ 。

梯度下降与正规方程的比较：

梯度下降：需要选择学习率α，需要多次迭代，当特征数量n大时也能较好适用，适用于各种类型的模型

正规方程：不需要选择学习率α，一次计算得出，需要计算 $\left ( X^{T} X\right )^{-1}$ ，如果特征数量n较大则运算代价大，因为矩阵逆的计算时间复杂度为 $O\left ( n3 \right )$ ，通常来说当n小于10000 时还是可以接受的，只适用于线性模型，不适合逻辑回归模型等其他模型

# 正规方程
def normalEqn(X, y):
    theta = np.linalg.inv(X.T@X)@X.T@y#X.T@X等价于X.T.dot(X)
    return theta

final_theta2=normalEqn(X, y)#感觉和批量梯度下降的theta的值有点差距
final_theta2

Out[15]: 
matrix([[-3.89578088],
        [ 1.19303364]])
#梯度下降得到的结果是matrix([[-3.89578088,  1.19303364]])

【Andrew Ng】机器学习Exercise1——Linear Regression

1、单变量线性回归

1.1 Plotting the Data

1.2 Gradient Descent

1.2.1 Update Equations（更新方程）

1.2.2 Batch Gradient Descent（批量梯度下降）

1.2.3 Visualize Cost Data（代价数据可视化）

2、多变量线性回归

3、Normal Equations（正规方程）

猜你喜欢

【Andrew Ng】 机器学习Exercise1——Linear Regression

1、单变量线性回归

1.1 Plotting the Data

1.2 Gradient Descent

1.2.1 Update Equations（更新方程）

1.2.2 Batch Gradient Descent（批量梯度下降）

1.2.3 Visualize Cost Data（代价数据可视化）

2、多变量线性回归

3、Normal Equations（正规方程）

猜你喜欢

【Andrew Ng】机器学习Exercise1——Linear Regression