[Python] Linear Regression Examples implemented in Python (reference)

Linear Regression Examples implemented in Python (reference)

Simple linear regression

Python to implement a simple example of linear regression.

The following two assumptions are linearly dependent variables. So, we tried to find a linear function, in response to the predicted value (y) as accurately as possible, or as a characteristic independent variable (x) function.

x 0 1 2 3 4 5 6 7 8 9
Y 1 3 2 5 7 8 8 9 10 12

In general, we define:

as a feature vector x, i.e. x = [x_1, x_2, ..., x_n],

y is the response vector, i.e., y = [y_1, y_2, ..., y_n]

For n observations (in the above example, n = 10).

Implemented in Python scattergram said data set.


# -*- coding: utf-8 -*-
"""
Created on Thu Mar 26 18:48:49 2020

@author: Bean029
"""

import numpy as np 
import matplotlib.pyplot as plt 
  
def estimate_coef(x, y): 
    # number of observations/points 
    n = np.size(x) 
  
    # mean of x and y vector 
    m_x, m_y = np.mean(x), np.mean(y) 
  
    # calculating cross-deviation and deviation about x 
    SS_xy = np.sum(y*x) - n*m_y*m_x 
    SS_xx = np.sum(x*x) - n*m_x*m_x 
  
    # calculating regression coefficients 
    b_1 = SS_xy / SS_xx 
    b_0 = m_y - b_1*m_x 
  
    return(b_0, b_1) 
  
def plot_regression_line(x, y, b): 
    # plotting the actual points as scatter plot 
    plt.scatter(x, y, color = "m", 
               marker = "o", s = 30) 
  
    # predicted response vector 
    y_pred = b[0] + b[1]*x 
  
    # plotting the regression line 
    plt.plot(x, y_pred, color = "g") 
  
    # putting labels 
    plt.xlabel('x') 
    plt.ylabel('y') 
  
    # function to show plot 
    plt.show() 
  
def main(): 
    # observations 
    x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 
    y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12]) 
  
    # estimating coefficients 
    b = estimate_coef(x, y) 
    print("Estimated coefficients:\nb_0 = {}  \nb_1 = {}".format(b[0], b[1])) 
  
    # plotting regression line 
    plot_regression_line(x, y, b) 
  
if __name__ == "__main__": 
    main() 

Scattergram produced as follows:

Now our task is to find the most appropriate line above the scatter plot so that we can predict the response to any new characteristic values. (I.e., the data set does not exist a value x)

This line is called the regression line.


Multiple linear regression

Multiple linear regression attempt by fitting a linear equation to observed data to simulate the response characteristics and the relationship between two or more.

Obviously, this is just an extension of simple linear regression.

Consider a p features (or argument) and a response (or dependent variable) data set has.

Further, the data set further comprises n rows / observations.

# -*- coding: utf-8 -*-
"""
Created on Thu Mar 26 18:53:13 2020

@author: Bean029
"""

import matplotlib.pyplot as plt 
import numpy as np 
from sklearn import datasets, linear_model, metrics 
  
# load the boston dataset 
boston = datasets.load_boston(return_X_y=False) 
  
# defining feature matrix(X) and response vector(y) 
X = boston.data 
y = boston.target 
  
# splitting X and y into training and testing sets 
from sklearn.model_selection import train_test_split 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, 
                                                    random_state=1) 
  
# create linear regression object 
reg = linear_model.LinearRegression() 
  
# train the model using the training sets 
reg.fit(X_train, y_train) 
  
# regression coefficients 
print('Coefficients: \n', reg.coef_) 
  
# variance score: 1 means perfect prediction 
print('Variance score: {}'.format(reg.score(X_test, y_test))) 
  
# plot for residual error 
  
## setting plot style 
plt.style.use('fivethirtyeight') 
  
## plotting residual errors in training data 
plt.scatter(reg.predict(X_train), reg.predict(X_train) - y_train, 
            color = "red", s = 10, label = 'Train data') 
  
## plotting residual errors in test data 
plt.scatter(reg.predict(X_test), reg.predict(X_test) - y_test, 
            color = "blue", s = 10, label = 'Test data') 
  
## plotting line for zero residual error 
plt.hlines(y = 0, xmin = 0, xmax = 50, linewidth = 2) 
  
## plotting legend 
plt.legend(loc = 'upper right') 
  
## plot title 
plt.title("Residual errors") 
  
## function to show plot 
plt.show() 

 After the result of the program:

 

Published 619 original articles · won praise 185 · views 660 000 +

Guess you like

Origin blog.csdn.net/seagal890/article/details/105125566