1. 线性回归
线性回归算法假设特征和结果满足线性关系。这就意味着可以将输入项分别乘以一些常量,再将结果加起来得到输出。
2. 最小二乘法
线性拟合。将拟合函数取 线性函数或多项 式函数是一种简 单的数据拟合方法。确定线性拟合函数 φ(x)=a+bx, 称为对数据的线性拟合。对于线性拟合问题,需要求函数
S ( a , b ) = ∑ k = 1 m [ ( a + b x k ) − y k ] 2 S(a,b)=\sum_{k=1}^{m}[(a+bx_{k})-y_{k}]^2 S(a,b)=k=1∑m[(a+bxk)−yk]2
的最小值点。
由函数对两个变量求导数,得
∂ S ∂ a = 2 ∑ k = 1 m [ ( a + b x k ) − y k ] , ∂ S ∂ a = 2 ∑ k = 1 m [ ( a + b x k ) − y k ] \frac{\partial S}{\partial a} = 2\sum_{k=1}^{m}[(a+bx_{k})-y_{k}], \\[2ex] \frac{\partial S}{\partial a} = 2\sum_{k=1}^{m}[(a+bx_{k})-y_{k}] ∂a∂S=2k=1∑m[(a+bxk)−yk],∂a∂S=2k=1∑m[(a+bxk)−yk]
令其等于零,得正规方程组
{ m a + ∑ k = 1 m x k b = ∑ k = 1 m y k ∑ k = 1 m x k a + ∑ k = 1 m x k 2 b = ∑ k = 1 m x k y k \begin{cases} ma + \sum_{k=1}^{m}x_kb = \sum_{k=1}^{m}y_k \\[2ex] \sum_{k=1}^{m}x_ka + \sum_{k=1}^{m}{x}_k^2b = \sum_{k=1}^{m}x_ky_k \\ \end{cases} ⎩⎨⎧ma+∑k=1mxkb=∑k=1myk∑k=1mxka+∑k=1mxk2b=∑k=1mxkyk
转换成矩阵方式
[ m a ∑ k = 1 m x k ∑ k = 1 m x k ∑ k = 1 m x k 2 ] [ a b ] = [ ∑ k = 1 m y k ∑ k = 1 m x k y k ] \begin{bmatrix} ma & \sum_{k=1}^{m}x_k \\ \sum_{k=1}^{m}x_k & \sum_{k=1}^{m}{x}_k^2 \\ \end{bmatrix} \begin{bmatrix} a \\b \\ \end{bmatrix} = \begin{bmatrix} \sum_{k=1}^{m}y_k \\\sum_{k=1}^{m}x_ky_k \\ \end{bmatrix} [ma∑k=1mxk∑k=1mxk∑k=1mxk2][ab]=[∑k=1myk∑k=1mxkyk]
求出a和b。
类似上面推导,数据的多项式拟合问题中,为了确定拟合函数的系数,需要求解正规方程组
2.1 python 示例
from sklearn import linear_model
import numpy as np
import matplotlib.pyplot as plt
def train(train_x, train_y, train_mode='basic'):
weight = []
if(train_mode=='basic'):
#普通解法
A = np.array([[2, np.sum(train_x)],[np.sum(train_x), np.sum(train_x*train_x)]])
b = np.array([np.sum(train_y), np.sum(train_x*train_y)]).reshape(-1,1)
AI = np.matrix(A).I
bm = np.matrix(b)
w = np.dot(AI,bm).tolist()
print(AI)
print(b)
print(w)
weight.extend([w[1][0], w[0][0]])
print(weight)
elif(train_mode=='scikit-learn'):
#scikit-learn解法
reg = linear_model.LinearRegression()
reg.fit(X, Y)
# y_pre = reg.predict(X)
weight.extend([reg.coef_[0][0], reg.intercept_[0]])
return weight
if __name__ == '__main__':
X = np.array([0, 1, 2, 3, 4, 5]).reshape(-1,1)
Y = np.array([0, 1, 2, 3, 4, 5.1]).reshape(-1,1)
weight = train(X, Y, 'scikit-learn')
plt.scatter(X, Y, color='black')
plt.plot(X,weight[0]*X+weight[1])
plt.show()