Linear Regression Examples implemented in Python (reference)
Simple linear regression
Python to implement a simple example of linear regression.
The following two assumptions are linearly dependent variables. So, we tried to find a linear function, in response to the predicted value (y) as accurately as possible, or as a characteristic independent variable (x) function.
x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Y | 1 | 3 | 2 | 5 | 7 | 8 | 8 | 9 | 10 | 12 |
In general, we define:
as a feature vector x, i.e. x = [x_1, x_2, ..., x_n],
y is the response vector, i.e., y = [y_1, y_2, ..., y_n]
For n observations (in the above example, n = 10).
Implemented in Python scattergram said data set.
# -*- coding: utf-8 -*-
"""
Created on Thu Mar 26 18:48:49 2020
@author: Bean029
"""
import numpy as np
import matplotlib.pyplot as plt
def estimate_coef(x, y):
# number of observations/points
n = np.size(x)
# mean of x and y vector
m_x, m_y = np.mean(x), np.mean(y)
# calculating cross-deviation and deviation about x
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
# calculating regression coefficients
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return(b_0, b_1)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)
# predicted response vector
y_pred = b[0] + b[1]*x
# plotting the regression line
plt.plot(x, y_pred, color = "g")
# putting labels
plt.xlabel('x')
plt.ylabel('y')
# function to show plot
plt.show()
def main():
# observations
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \nb_1 = {}".format(b[0], b[1]))
# plotting regression line
plot_regression_line(x, y, b)
if __name__ == "__main__":
main()
Scattergram produced as follows:
Now our task is to find the most appropriate line above the scatter plot so that we can predict the response to any new characteristic values. (I.e., the data set does not exist a value x)
This line is called the regression line.
Multiple linear regression
Multiple linear regression attempt by fitting a linear equation to observed data to simulate the response characteristics and the relationship between two or more.
Obviously, this is just an extension of simple linear regression.
Consider a p features (or argument) and a response (or dependent variable) data set has.
Further, the data set further comprises n rows / observations.
# -*- coding: utf-8 -*-
"""
Created on Thu Mar 26 18:53:13 2020
@author: Bean029
"""
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model, metrics
# load the boston dataset
boston = datasets.load_boston(return_X_y=False)
# defining feature matrix(X) and response vector(y)
X = boston.data
y = boston.target
# splitting X and y into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4,
random_state=1)
# create linear regression object
reg = linear_model.LinearRegression()
# train the model using the training sets
reg.fit(X_train, y_train)
# regression coefficients
print('Coefficients: \n', reg.coef_)
# variance score: 1 means perfect prediction
print('Variance score: {}'.format(reg.score(X_test, y_test)))
# plot for residual error
## setting plot style
plt.style.use('fivethirtyeight')
## plotting residual errors in training data
plt.scatter(reg.predict(X_train), reg.predict(X_train) - y_train,
color = "red", s = 10, label = 'Train data')
## plotting residual errors in test data
plt.scatter(reg.predict(X_test), reg.predict(X_test) - y_test,
color = "blue", s = 10, label = 'Test data')
## plotting line for zero residual error
plt.hlines(y = 0, xmin = 0, xmax = 50, linewidth = 2)
## plotting legend
plt.legend(loc = 'upper right')
## plot title
plt.title("Residual errors")
## function to show plot
plt.show()
After the result of the program: