Machine learning algorithm theory and practical (two) - Linear Regression

table of Contents:

I. INTRODUCTION

1. Linear regression of the type

2. Assumptions

Second, by constructing a regressor step Python

Third, the simple linear regression with Python

1. analog data and plot

2. Simple linear regression procedure

3. scikit-learn Linear Regression

Fourth, to achieve multiple linear regression with Python

1. Load the Boston Housing dataset

2. The data set is divided into training and test sets

3. calculating the coefficient and intercept

4. Draw scattergram

V. Applications


I. INTRODUCTION

Linear regression can be defined as a statistical model for the analysis because between variables and given a set of independent variables linear relationship. It means a linear relationship between variables, when one or more independent variables value changes (increases or decreases), the dependent variable is also a corresponding change (increase or decrease).

The mathematical relationship can be expressed by means of the following equation:

Y = aX + b

Here, Y is the dependent variable we try to predict, X is the independent variable that we used to predict, a is the slope of the regression line, b is a constant, called the intercept.

1. Linear regression of the type

  • Simple linear regression
  • Multiple linear regression

2. Assumptions

Here are some assumptions on the establishment of a linear regression model data set:

Multicollinearity: linear regression model assumes that data with little or no multicollinearity. Basically, when the argument or elements having relevance, multicollinearity occurs.

Autocorrelation: almost no data autocorrelation. Basically, when there is a dependency between the residual autocorrelation occurs.

Relationship between the variables: Linear regression model assumes that the relationship between variables and the response variable characteristic must be linear.

Second, in Python construct a regressor step

Learn-Scikit , a Python library for machine learning, it can also be used to establish a regression in Python.

In the following example, we will construct a basic regression model that would fit the data line, i.e. linear regression. Construction regressor necessary step in Python follows:

Step 1: introducing the necessary python package.

Step 2: Import data set.

Step 3: Organize the data into training and test set.

Step 4: Construction and model prediction.

Step 5: Drawing and visualization.

Step 6: performance calculation. Performance indicators include: the mean square error MSE, the root mean square error RMSE, mean absolute error MAE and R Squared like.

Third, the simple linear regression with Python

1. analog data and plot

# 导包
import numpy as np
import matplotlib.pyplot as plt

# 模拟数据
x = np.array([1., 2., 3., 4., 5.])
y = np.array([1., 3., 2., 3., 5.])

# 绘制散点图
plt.scatter(x, y)
plt.axis([0, 6, 0, 6]) # 设置x,y轴的氛围
plt.show()

2. Simple linear regression procedure

Calculate the slope and intercept of the regression line ①

Slope and intercept of the following formula (Note: in this case the slope is a, intercept B):

# 求x,y的平均值
x_mean = np.mean(x)
y_mean = np.mean(y)

# 求斜率a和截距b
num = 0.0
d = 0.0
for x_i, y_i in zip(x, y):
    num += (x_i - x_mean) * (y_i - y_mean)
    d += (x_i - x_mean) ** 2

a = num / d
b = y_mean - a * x_mean

print(a)
print(b)
0.8
0.39999999999999947

② draw scatterplot with the regression line

# 回归线
y_hat = a * x + b

# 绘制带回归线的散点图
plt.scatter(x, y)
plt.plot(x, y_hat, color = 'r')
plt.axis([0, 6, 0, 6])
plt.show()

③ prediction data

x_predict = 6
y_predict = a * x_predict + b
print(y_predict)
5.2

3. scikit-learn Linear Regression

# 导包
from sklearn.linear_model import LinearRegression
import numpy as np

# 创建线性回归对象
reg = LinearRegression()

# 模拟数据 
x = np.array([1., 2., 3., 4., 5.])
x_train = x.reshape(-1,1)
y_train = np.array([1., 3., 2., 3., 5.])

# 训练
reg.fit(X_train,y_train)


# 计算出的斜率与截距
print(reg.coef_)    
print(reg.intercept_)  
array([0.8])
0.39999999999999947
# 预测数据
y_predict = reg.predict(x_train)

# 绘图
plt.scatter(x_train, y_train)
plt.plot(x, y_predict, color = 'r')
plt.axis([0, 6, 0, 6])
plt.show()

Obviously, the results are consistent with the previous forecast.

Fourth, to achieve multiple linear regression with Python

Simple linear regression using two or more extended features predicted response. Mathematically, we can be explained as follows:

Consider having n number of observations, P a feature (i.e., the independent variables) and y as a response (i.e., the dependent variable) data set, P a regression line features may be calculated as follows:

1. Load the Boston Housing dataset

# 导包
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

# 加载boston数据
boston = datasets.load_boston()

# 定义特征矩阵X和响应向量y
X = boston.data
y = boston.target

# 过滤掉y=50的数据,因为y溢出的数据都为50,考虑后续数据预测的准确性,删除
X = X[y < 50.0]
y = y[y < 50.0]

X.shape    # 显示为(490, 13)

2. The data set is divided into training and test sets

from model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, seed = 666)  # seed为随机种子

3. calculating the coefficient and intercept

from sklearn.linear_model import LinearRegression

# 创建线性回归对象并训练模型
reg = LinearRegression()
reg.fit(X_train, y_train)

# 计算出的系数与截距
print(reg.coef_)    
print(reg.intercept_)  
array([-1.20354261e-01,  3.64423279e-02, -3.61493155e-02,  5.12978140e-02,
       -1.15775825e+01,  3.42740062e+00, -2.32311760e-02, -1.19487594e+00,
        2.60101728e-01, -1.40219119e-02, -8.35430488e-01,  7.80472852e-03,
       -3.80923751e-01])
34.11739972320428

4. Draw scattergram

plt.scatter(reg.predict(X_train), reg.predict(X_train) - y_train, color = "green", s = 10, label = 'Train data')
plt.scatter(reg.predict(X_test), reg.predict(X_test) - y_test, color = "blue", s = 10, label = 'Test data')
plt.hlines(y = 0, xmin = 0, xmax = 50, linewidth = 2)
plt.legend(loc = 'upper right')
plt.title("Residual errors")  # 残差
plt.show()

V. Applications

Application ML regression algorithm is as follows:

1. prediction or predictive analysis 

One important use of regression analysis is to predict or forecast. For example, we can predict GDP, oil prices or simply quantitative data over time and change.

2. Optimize 

We can optimize business processes by means of regression. For example, a store manager can create a statistical model to understand how customers visiting time.

3. Error Correction 

In business, we make the right decisions and optimize business processes are equally important. Regression can help us make the right decisions, we can also help correct decisions have been implemented.

4. Economics 

This is the economics of the most commonly used tools. We can use regression to predict the supply, demand, consumption, inventory investment.

5. Financial 

The company has always been to minimize the financial risk of the investment portfolio are interested and want to understand the factors that affect customers. All of which can be predicted by means of regression model.

 

 

 

Published 62 original articles · won praise 153 · views 40000 +

Guess you like

Origin blog.csdn.net/weixin_40431584/article/details/104677312