回归算法：是一种有监督的算法。
回归算法是一种比较常用的机器学习算法，用来表示自变量X和因变量Y之间的关系。从机器学习的角度，构建一个算法模型来做属性X与标签Y之间的映射关系。

普通最小二乘法线性回归案例

import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt

#设置字符集，防止中文乱码
mpl.rcParams[‘font.sans-serif’] = [u’simHei’]
mpl.rcParams[‘axes.unicode_minus’] = False

#加载数据
path = ‘./datas/household_power_consumption_1000.txt’
df = pd.read_csv(path,sep=’;’)
print(df.head())
#获取功率值作为特征属性X，电流作为目标属性Y
X = df.iloc[:,2:4] # 获取功率值：行和列
print(X.head())
Y = df.iloc[:,5] # 获取电流值
#print(Y)

#获取训练数据和测试数据
n = int(X.shape[0] * 0.8)
train_x = np.array(X[:n])
test_x = np.array(X[n:])
train_y = np.array(Y[:n])
test_y = np.array(Y[n:])
print(“总样本的数量：{}，训练样本的数量：{}，测试样本的数量：{}”.format(X.shape,train_x.shape,test_x.shape))

#训练模型
a. 训练数据转换为矩阵的形式
x = np.mat(train_x) # 将x转换成一个矩阵的形式。
y = np.mat(train_y).reshape(-1,1)
b. 训练模型参数theta值
theta = (x.T * x).I * x.T * y
print(theta.shape)
print(“求解的theta值：{}”.format(theta))

#模型的效果评估
y_pre = np.mat(test_x) * theta

#画图查看一下效果
t = np.arange(len(test_x))
plt.figure(facecolor=‘w’)
plt.plot(t,y_pre,‘g-’,linewidth = 2, label = u’预测值’)
plt.plot(t,test_y,‘r-’,linewidth = 2,label = u’真实值’)
plt.legend(loc=‘lower right’)
plt.title(‘线性回归’)
plt.show()

#6. 模型的存储：将theta值保存到数据库中；在需要的时候，再将theta值加载到模型中。

theta1 = theta[0]
theta2 = theta[1]

#产生预测值
global_active_power = 4.216
global_reactive_power = 0.418
print(“当前输入的特征属性值为：{}----------{}”.format(global_active_power,global_reactive_power))
print(“预测值为：{}”.format(global_active_power * theta1 + global_reactive_power * theta2))

在这里插入图片描述

机器学习-回归算法

普通最小二乘法线性回归案例

猜你喜欢