[Python] Python实现的 Linear Regression 例子(附图)

Python实现的 Linear Regression 例子(附图)

简单线性回归

Python来实现一个简单的线性回归的例子。

假设下面的两个变量是线性相关的。因此,我们试图找到一个线性函数,尽可能准确地预测响应值(y)作为特征或自变量(x)的函数。

x 0 1 2 3 4 5 6 7 8 9
y 1 3 2 5 7 8 8 9 10 12

一般而言,我们定义:

x作为特征向量,即x=[x_1,x_2,…,x_n],

y为响应向量,即y=[y_1,y_2,…,y_n]

对于n个观测值(在上面的例子中,n=10)。

用Python来实现上述数据集的散点图。


# -*- coding: utf-8 -*-
"""
Created on Thu Mar 26 18:48:49 2020

@author: Bean029
"""

import numpy as np 
import matplotlib.pyplot as plt 
  
def estimate_coef(x, y): 
    # number of observations/points 
    n = np.size(x) 
  
    # mean of x and y vector 
    m_x, m_y = np.mean(x), np.mean(y) 
  
    # calculating cross-deviation and deviation about x 
    SS_xy = np.sum(y*x) - n*m_y*m_x 
    SS_xx = np.sum(x*x) - n*m_x*m_x 
  
    # calculating regression coefficients 
    b_1 = SS_xy / SS_xx 
    b_0 = m_y - b_1*m_x 
  
    return(b_0, b_1) 
  
def plot_regression_line(x, y, b): 
    # plotting the actual points as scatter plot 
    plt.scatter(x, y, color = "m", 
               marker = "o", s = 30) 
  
    # predicted response vector 
    y_pred = b[0] + b[1]*x 
  
    # plotting the regression line 
    plt.plot(x, y_pred, color = "g") 
  
    # putting labels 
    plt.xlabel('x') 
    plt.ylabel('y') 
  
    # function to show plot 
    plt.show() 
  
def main(): 
    # observations 
    x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 
    y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12]) 
  
    # estimating coefficients 
    b = estimate_coef(x, y) 
    print("Estimated coefficients:\nb_0 = {}  \nb_1 = {}".format(b[0], b[1])) 
  
    # plotting regression line 
    plot_regression_line(x, y, b) 
  
if __name__ == "__main__": 
    main() 

产生的散点图如下所示:

现在,我们的任务是找到一条最适合上述散点图的线,以便我们可以预测任何新特征值的响应。(即数据集中不存在x值)

这条线叫做回归线。


多元线性回归

多元线性回归试图通过将线性方程拟合到观测数据来模拟两个或多个特征与响应之间的关系。

显然,这只是简单线性回归的一个扩展。

考虑一个具有p个特征(或自变量)和一个响应(或因变量)的数据集。

此外,数据集还包含n行/观察值。

# -*- coding: utf-8 -*-
"""
Created on Thu Mar 26 18:53:13 2020

@author: Bean029
"""

import matplotlib.pyplot as plt 
import numpy as np 
from sklearn import datasets, linear_model, metrics 
  
# load the boston dataset 
boston = datasets.load_boston(return_X_y=False) 
  
# defining feature matrix(X) and response vector(y) 
X = boston.data 
y = boston.target 
  
# splitting X and y into training and testing sets 
from sklearn.model_selection import train_test_split 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, 
                                                    random_state=1) 
  
# create linear regression object 
reg = linear_model.LinearRegression() 
  
# train the model using the training sets 
reg.fit(X_train, y_train) 
  
# regression coefficients 
print('Coefficients: \n', reg.coef_) 
  
# variance score: 1 means perfect prediction 
print('Variance score: {}'.format(reg.score(X_test, y_test))) 
  
# plot for residual error 
  
## setting plot style 
plt.style.use('fivethirtyeight') 
  
## plotting residual errors in training data 
plt.scatter(reg.predict(X_train), reg.predict(X_train) - y_train, 
            color = "red", s = 10, label = 'Train data') 
  
## plotting residual errors in test data 
plt.scatter(reg.predict(X_test), reg.predict(X_test) - y_test, 
            color = "blue", s = 10, label = 'Test data') 
  
## plotting line for zero residual error 
plt.hlines(y = 0, xmin = 0, xmax = 50, linewidth = 2) 
  
## plotting legend 
plt.legend(loc = 'upper right') 
  
## plot title 
plt.title("Residual errors") 
  
## function to show plot 
plt.show() 

 程序运行后的结果:

发布了619 篇原创文章 · 获赞 185 · 访问量 66万+

猜你喜欢

转载自blog.csdn.net/seagal890/article/details/105125566