机器学习基础（七）之线性回归1

Jupyter Notebook
简单的线性回归
最后检查: 几秒前
(自动保存)
Current Kernel Logo
Python 3
File
Edit
View
Insert
Cell
Kernel
Widgets
Help

简单的线性回归算法

import numpy as np
from matplotlib import pyplot as plt

X_train = np.array([1,2,3,5,6])
X_train = np.array([1,2,3,5,6])

Y_train = np.array([2,4,5,11,14])

plt.scatter(X_train,Y_train)
plt.scatter(X_train,Y_train)
<matplotlib.collections.PathCollection at 0x2a571552748>

在这里插入图片描述

最小二乘法公式
a=∑i=1n(xi−x¯)(yi−y¯) / ∑i=1n(xi−x¯)2
b=y¯−ax¯

x_mean = np.mean(X_train)
y_mean = np.mean(Y_train)
根据公式求a的值

numerator = 0.0
denominator = 0.0

for x,y in zip(X_train,Y_train):
    numerator += (x-x_mean) * (y-y_mean)
    denominator += (x-x_mean)**2

a = numerator / denominator

a
2.4186046511627906

b = y_mean - a*x_mean

b
-1.0232558139534875
预测的直线公式

y_linear = [a * x + b for x in X_train]

plt.scatter(X_train,Y_train)
plt.plot(X_train,y_linear,color="orange")
[<matplotlib.lines.Line2D at 0x2a5715a7ba8>]```


![在这里插入图片描述](https://img-blog.csdnimg.cn/20190225010744134.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzM3OTgyMTA5,size_16,color_FFFFFF,t_70)

进行预测

x_predict = 3
y_predict = a*x_predict + b

plt.scatter(X_train,Y_train)
plt.plot(X_train,y_linear,color=“orange”)
plt.scatter(x_predict,y_predict,marker=“x”,color=“r”)
<matplotlib.collections.PathCollection at 0x2a571674550>```

在这里插入图片描述

预测的值红色x与实际的值的差距，由上图可以看出
用R squared来计算一下啦

from sklearn.metrics import r2_score
由上诉数据x=3时相应的y为5

Y_test = np.array([5])
Y_predict = np.array([y_predict])
len(Y_predict.shape)
1

//r2_score需要的参数是二维数组，此处写错了，导致错误的发生
r2_score(Y_test,y_predict)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-1748b726d92c> in <module>
----> 1 r2_score(Y_test,y_predict)

~\Anaconda3\lib\site-packages\sklearn\metrics\regression.py in r2_score(y_true, y_pred, sample_weight, multioutput)
    532     """
    533     y_type, y_true, y_pred, multioutput = _check_reg_targets(
--> 534         y_true, y_pred, multioutput)
    535     check_consistent_length(y_true, y_pred, sample_weight)
    536 

~\Anaconda3\lib\site-packages\sklearn\metrics\regression.py in _check_reg_targets(y_true, y_pred, multioutput)
     73 
     74     """
---> 75     check_consistent_length(y_true, y_pred)
     76     y_true = check_array(y_true, ensure_2d=False)
     77     y_pred = check_array(y_pred, ensure_2d=False)

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
    229     """
    230 
--> 231     lengths = [_num_samples(X) for X in arrays if X is not None]
    232     uniques = np.unique(lengths)
    233     if len(uniques) > 1:

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in <listcomp>(.0)
    229     """
    230 
--> 231     lengths = [_num_samples(X) for X in arrays if X is not None]
    232     uniques = np.unique(lengths)
    233     if len(uniques) > 1:

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in _num_samples(x)
    140         if len(x.shape) == 0:
    141             raise TypeError("Singleton array %r cannot be considered"
--> 142                             " a valid collection." % x)
    143         # Check that shape is returning an integer or default to len
    144         # Dask dataframes may not return numeric shape[0] value

TypeError: Singleton array 6.232558139534884 cannot be considered a valid collection.


r2_score(Y_test,Y_predict)
0.0
//这个结果有点尴尬

机器学习基础（七）之线性回归1

猜你喜欢