Jupyter Notebook
简单的线性回归
最后检查: 几秒前
(自动保存)
Current Kernel Logo
Python 3
File
Edit
View
Insert
Cell
Kernel
Widgets
Help
简单的线性回归算法
import numpy as np
from matplotlib import pyplot as plt
X_train = np.array([1,2,3,5,6])
X_train = np.array([1,2,3,5,6])
Y_train = np.array([2,4,5,11,14])
plt.scatter(X_train,Y_train)
plt.scatter(X_train,Y_train)
<matplotlib.collections.PathCollection at 0x2a571552748>
最小二乘法公式
a=∑i=1n(xi−x¯)(yi−y¯) / ∑i=1n(xi−x¯)2
b=y¯−ax¯
x_mean = np.mean(X_train)
y_mean = np.mean(Y_train)
根据公式求a的值
numerator = 0.0
denominator = 0.0
for x,y in zip(X_train,Y_train):
numerator += (x-x_mean) * (y-y_mean)
denominator += (x-x_mean)**2
a = numerator / denominator
a
2.4186046511627906
b = y_mean - a*x_mean
b
-1.0232558139534875
预测的直线公式
y_linear = [a * x + b for x in X_train]
plt.scatter(X_train,Y_train)
plt.plot(X_train,y_linear,color="orange")
[<matplotlib.lines.Line2D at 0x2a5715a7ba8>]```
![在这里插入图片描述](https://img-blog.csdnimg.cn/20190225010744134.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzM3OTgyMTA5,size_16,color_FFFFFF,t_70)
进行预测
x_predict = 3
y_predict = a*x_predict + b
plt.scatter(X_train,Y_train)
plt.plot(X_train,y_linear,color=“orange”)
plt.scatter(x_predict,y_predict,marker=“x”,color=“r”)
<matplotlib.collections.PathCollection at 0x2a571674550>```
预测的值红色x与实际的值的差距,由上图可以看出
用R squared来计算一下啦
from sklearn.metrics import r2_score
由上诉数据x=3时相应的y为5
Y_test = np.array([5])
Y_predict = np.array([y_predict])
len(Y_predict.shape)
1
//r2_score需要的参数是二维数组,此处写错了,导致错误的发生
r2_score(Y_test,y_predict)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-16-1748b726d92c> in <module>
----> 1 r2_score(Y_test,y_predict)
~\Anaconda3\lib\site-packages\sklearn\metrics\regression.py in r2_score(y_true, y_pred, sample_weight, multioutput)
532 """
533 y_type, y_true, y_pred, multioutput = _check_reg_targets(
--> 534 y_true, y_pred, multioutput)
535 check_consistent_length(y_true, y_pred, sample_weight)
536
~\Anaconda3\lib\site-packages\sklearn\metrics\regression.py in _check_reg_targets(y_true, y_pred, multioutput)
73
74 """
---> 75 check_consistent_length(y_true, y_pred)
76 y_true = check_array(y_true, ensure_2d=False)
77 y_pred = check_array(y_pred, ensure_2d=False)
~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
229 """
230
--> 231 lengths = [_num_samples(X) for X in arrays if X is not None]
232 uniques = np.unique(lengths)
233 if len(uniques) > 1:
~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in <listcomp>(.0)
229 """
230
--> 231 lengths = [_num_samples(X) for X in arrays if X is not None]
232 uniques = np.unique(lengths)
233 if len(uniques) > 1:
~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in _num_samples(x)
140 if len(x.shape) == 0:
141 raise TypeError("Singleton array %r cannot be considered"
--> 142 " a valid collection." % x)
143 # Check that shape is returning an integer or default to len
144 # Dask dataframes may not return numeric shape[0] value
TypeError: Singleton array 6.232558139534884 cannot be considered a valid collection.
r2_score(Y_test,Y_predict)
0.0
//这个结果有点尴尬