Linear Regression Algorithm from Cognition to Practical Practice including Boston House Price Prediction Case

Article directory

2.1 Introduction to linear regression

learning target

  • Understand the application scenarios of linear regression
  • Know the definition of linear regression

1 Linear regression application scenario

  • house price forecast
  • sales forecast
  • Loan amount forecast

Example:

image-20190220211910033

2 What is linear regression

2.1 Definition and formula

Linear regression is an analysis method that uses regression equations (functions) to model the relationship between one or more independent variables (eigenvalues) and dependent variables (target values) .

  • Features: The situation with only one independent variable is called univariate regression, and the situation with more than one independent variable is called multiple regression.

    image-20200222113636911

  • Example of linear regression represented by matrix

image-20190403154003572

So how to understand it? Let's look at some examples

  • Final grade: 0.7×exam grade+0.3×daily grade
  • House price = 0.02 × distance from the central area + 0.04 × urban nitric oxide concentration + (-0.12 × average house price of owner-occupied housing) + 0.254 × urban crime rate

In the above two examples, we see that a relationship is established between the feature value and the target value. This relationship can be understood as a linear model .

2.2 Analysis of the relationship between linear regression features and targets

There are two main models in linear regression, one is a linear relationship and the other is a nonlinear relationship. **Here we can only draw a plane for better understanding, so we use a single feature or two features as examples.

  • linear relationship

    • Univariate linear relationship:

    线性关ç³"图

    • multivariable linear relationship

      [The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-vV6GsHIF-1664347182799) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1gabe70bocyj30wy0qudpl.jpg)]

Note: The relationship between a single feature and the target value is a straight line, or the relationship between two features and the target value is a plane.

We in higher dimensions don’t have to think about it ourselves, just remember this relationship

  • non-linear relationship

    [External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-26VXIAGp-1664347182799) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1gabe58goruj30ci088gn0-1664346823276-5.jpg)]

Note: Why is there such a relationship? what is the reason?

If it is a non-linear relationship, then the regression equation can be understood as:

w_1x_1+w_2x_22+w_3x_32w1x1+w2x22+w3x32

3 Summary

  • The definition of linear regression [understand]
    • An analysis method that uses regression equations (functions) to model the relationship between one or more independent variables (eigenvalues) and dependent variables (target values)
  • Classification of Linear Regression [Know]
    • linear relationship
    • non-linear relationship

2.2 Initial use of linear regression API

learning target

  • Know the simple use of linear regression API

1 Linear Regression API

1.1. Linear Models — scikit-learn 1.1.2 documentation

  • sklearn.linear_model.LinearRegression()
    • LinearRegression.coef_: The value of w of the model
    • LinearRegression.intercept_: value of b

2 examples

[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-L8sTsy6O-1664347182800) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1ga8u2zf8o4j30p80g8tai.jpg)]

2.1 Step Analysis

  • 1. Get the data set
  • 2. Basic data processing (omitted in this case)
  • 3. Feature engineering (omitted in this case)
  • 4. Machine Learning
  • 5. Model evaluation (omitted in this case)

2.2 Code process

  • Import module
from sklearn.linear_model import LinearRegression
  • Construct a data set
x = [[80, 86],
[82, 80],
[85, 78],
[90, 90],
[86, 82],
[82, 90],
[78, 80],
[92, 94]]
y = [84.2, 80.6, 80.1, 90, 83.2, 87.6, 79.4, 93.4]
  • Machine learning--model training
# 实例化API
estimator = LinearRegression()
# 使用fit方法进行训练
estimator.fit(x,y)
#
print(estimator.coef_)
print(estimator.intercept_)

estimator.predict([[100, 80]])

3 Summary

  • sklearn.linear_model.LinearRegression()
    • LinearRegression.coef_: The value of w of the model
    • LinearRegression.intercept_: value of b

2.3 Mathematics: Derivatives

learning target

  • Know common derivation methods
  • Know the four arithmetic operations of derivatives

1 Derivatives of common functions

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-SbPLF5P1-1664347182800) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1ga8u30c4d1j315w0iajxh.jpg)]

2 Four arithmetic operations of derivatives

[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-TRGsewJS-1664347182801) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1gaa81t8jvjj316809q76b.jpg)]

3 exercises

[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-ZKmfnAki-1664347182801) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1gaa81jygfmj30pe06uq38.jpg)]

[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-SAeizTA4-1664347182802) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1ga8u32opnpj3106034wfy.jpg)]

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-sce8mjTG-1664347182803) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1ga8u33u520j31d60320uz.jpg)]

]](https://img-blog.csdnimg.cn/0510d51b9cea4b46a36c65abe4a8305f.png)

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-fZwjjpM1-1664347182804) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1ga8u35wr86j30q002kt9o.jpg)]

4 Matrix (vector) derivation [understand]

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-hLaTxA7Y-1664347182804) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1ga8u36pjlbj30u00vtgr7.jpg)]

Reference link: https://en.wikipedia.org/wiki/Matrix_calculus#Scalar-by-vector_identities

3 Summary

  • Differentiation methods of common functions and the four arithmetic operations of derivatives

2.4 Loss and optimization of linear regression

learning target

  • Know the loss function in linear regression
  • Know the process of optimizing the loss function using gradient descent

Assuming the house example just now, there is such a relationship between real data:

真实关系:真实房子价格 = 0.02×中心区域的距离 + 0.04×城市一氧化氮浓度 + (-0.12×自住房平均房价) + 0.254×城镇犯罪率

So now, we randomly specify a relationship (guess)

随机指定关系:预测房子价格 = 0.25×中心区域的距离 + 0.14×城市一氧化氮浓度 + 0.42×自住房平均房价 + 0.34×城镇犯罪率

What will happen if this happens? Is there a certain error between the actual results and our predicted results? Something like this

Is the red one better first? It's because the red line has less loss.

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-bICeIVpF-1664347182805) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1ga8u18alruj30ps0jcmzc.jpg)]

Since this error exists, then we will measure this error

1 Loss function

Total loss is defined as:

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-WaSvm6Mk-1664347182805) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1ga8u192gzvj30v8066jrl.jpg)]

  • yi is the true value of the i-th training sample
  • h(xi) is the ith training sample feature value combination prediction function
  • This formula is also called the least squares method

How to reduce this loss and make our predictions more accurate? Since this loss exists, we have always said that machine learning has the function of automatic learning, and it can be reflected even more in linear regression. Here you can use some optimization methods to optimize (actually the derivation function in mathematics) the total loss of regression! ! !

2 Optimization algorithm

How to find W in the model to minimize the loss? (The purpose is to find the W value corresponding to the minimum loss)

  • gradient descent method
2.1 What is the gradient descent method?
  • The gradient descent method is a method used to optimize the model and achieve the optimal solution.
  • In mathematics, gradient is also called derivative. The direction of the gradient is the direction in which the model function value increases fastest, and the opposite direction of the gradient is the direction in which the function value decreases fastest.
  • The objective function is also called the loss function. This function measures the difference between the predicted value of the model and the true value of the data. Therefore, our goal is to minimize this gap so that the predicted value of the model can be considered as the true value of the data. .

The basic idea of ​​the gradient descent method can be compared to a process of going down a mountain.

Assume a scenario like this:

A person is trapped on a mountain and needs to come down from the mountain (ie find the lowest point of the mountain, which is the valley). However, there was heavy fog on the mountain at this time, resulting in very low visibility.

Therefore, the path down the mountain cannot be determined. He must use the information around him to find the path down the mountain. At this time, he can use the gradient descent algorithm to help him go down the mountain.

Specifically, based on his current location, find the steepest place in this location, and then walk towards the place where the height of the mountain drops . (Similarly, if our goal is to go up the mountain, that is, to climb to At the top of the mountain, you should be walking up in the steepest direction at this time). Then every time you walk a certain distance, you use the same method repeatedly, and finally you can successfully reach the valley.

[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-q5n5nYXB-1664347182806) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1ga8u1vatkkj30v80gyn1o.jpg)]

The basic process of gradient descent is very similar to the scene of going down a mountain.

First, we have a differentiable function . This function represents a mountain.

Our goal is to find the minimum value of this function , which is the bottom of the mountain.

According to the previous scenario hypothesis, the fastest way to go down the mountain is to find the steepest direction at the current location, and then go down in this direction. Corresponding to the function, it is to find the gradient of a given point, and then move in the opposite direction of the gradient. This will make the function value drop the fastest! Because the direction of the gradient is the direction in which the function value changes fastest. Therefore, we repeatedly use this method to repeatedly obtain the gradient, and finally reach the local minimum, which is similar to our process of going down a mountain. Finding the gradient determines the steepest direction, which is the method of measuring the direction in the scene.

2.2 The concept of gradient

Gradient is a very important concept in calculus

  • In a function of a single variable, the gradient is actually the differential of the function, which represents the slope of the tangent line of the function at a given point;

  • In a multivariable function, the gradient is a vector, and the vector has a direction. The direction of the gradient points out the fastest rising direction of the function at a given point;

    • In calculus, find the partial derivatives of ∂ for the parameters of a multivariate function, and write the obtained partial derivatives of each parameter in the form of a vector, which is the gradient.

This also explains why we need to do everything possible to find the gradient! If we need to reach the bottom of the mountain, we need to observe the steepest place at each step. The gradient just tells us this direction. The direction of the gradient is the direction in which the function rises fastest at a given point, then the opposite direction of the gradient is the direction in which the function drops fastest at a given point, which is exactly what we need. So as long as we keep walking in the opposite direction of the gradient, we can reach the local minimum!

2.3 How to use gradient descent

Given function:
f ( x , y ) = 3 x 2 + 4 y 2 − 10 f(x,y) = 3x^2+4y^2-10f(x,y)=3x _2+4y210Find
the minimum value of this function.

1. Randomly initialize xxxyyThe value of y is x = 6 x=6x=6y = 3 y=3y=3

2. Find f ( x , yf(x,yf(x,y ) versusxxDerivative of x f ′ ( x ) f^{'}(x)f (x)and foryyDerivative of y f ′ ( y ) f^{'}(y)f(y)
f ′ ( x ) = 6 x = 36 ; f ′ ( y ) = 8 y = 24 f^{'}(x) = 6x=36;f^{'}(y) = 8y=24 f(x)=6x _=36;f(y)=8 years=24
3. ModifyxxxyyThe value of y is a little small, such as
x ( t + 1 ) = xt − 0.001 f ′ ( x ) = 6 − 0.001 ∗ 36 = 5.964 x_{(t+1)} = x_t -0.001f^{'}(x ) = 6-0.001*36 = 5.964x(t+1)=xt0.001f(x)=60.00136=5.964

y ( t + 1 ) = y t − 0.001 f ′ ( y ) = 3 − 0.001 ∗ 24 = 2.976 y_{(t+1)} = y_t -0.001f^{'}(y) = 3 - 0.001*24 = 2.976 y(t+1)=yt0.001f(y)=30.00124=2.976

4. Repeat steps 2 and 3 in a loop until the function value reaches the minimum (for example, if the difference between our two function values ​​is 0, we say that the function value has reached the minimum).

2.4 General process of gradient descent method with two parameters:

In a neural network, x and y in the example above use w 1 w_1w1sum w 2 w_2w2means, f ( x , y ) f(x,y)f(x,y ) is called the loss function, and our goal is to minimize the function value. (We can simply think of3 x 2 + 4 y 2 3x^2+4y^23x _2+4y2 is the predicted value of the model, and 10 is the true value of the data. )

1. Randomly initialize w 1 w_1w1sum w 2 w_2w2value.

2. Loop traversal: for i=1...convergence:

​w 1 = w 1 − α Δ w 1 w_1 = w_1-\alpha\Delta{w_1}w1=w1a D w1

​w 2 = w 2 − α Δ w 2 w_2 = w_2-\alpha\Delta{w_2}w2=w2a D w2

2.5 Various forms of gradient descent methods

1.SGD: stochastic gradient descent method: use a sample when calculating the derivative

2.mini-batch gradient descent method: use a batch of data when calculating the derivative.

3. Batch gradient descent method: use all the data when calculating the derivative.

2.1 Normal equation

2.1.1 What is a normal equation?

X W = Y X T X W = X T Y ( X T X ) − 1 X T X W = ( X T X ) − 1 X T Y XW=Y \\ X^{T}XW = X^{T}Y \\ (X^{T}X)^{-1}X^{T}XW = (X^{T}X)^{-1}X^{T}Y XW=YXT XW=XTY(XTX)1XT XW=(XTX)1XTY

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-4SAhXWQo-1664347182807) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1ga8u19hxgmj30bc01qmx4.jpg)]

2.5 Reintroduction to linear regression API

learning target

  • Understand the API and common parameters of normal equations
  • Understand the gradient descent method API and common parameters

  • sklearn.linear_model.LinearRegression(fit_intercept=True)
    • Optimization via normal equations
    • parameter
      • fit_intercept: Whether to calculate the offset. If set to False, b will not be calculated and a straight line passing through the origin will be fitted.
    • Attributes
      • LinearRegression.coef_: regression coefficient
      • LinearRegression.intercept_: bias
  • sklearn.linear_model.SGDRegressor(loss=“squared_loss”, fit_intercept=True, learning_rate =‘invscaling’, eta0=0.01)
    • The SGDRegressor class implements stochastic gradient descent learning and supports different loss functions and regularization penalty terms to fit linear regression models.
    • parameter:
      • loss: loss type
        • loss=”squared_loss”: ordinary least squares method
      • fit_intercept: whether to calculate offset
      • eta0: learning rate
    • Attributes:
      • SGDRegressor.coef_: regression coefficient
      • SGDRegressor.intercept_: bias

sklearn provides us with two API implementations, which can be used according to your choice

summary

  • normal equation
    • sklearn.linear_model.LinearRegression()
  • gradient descent method
    • sklearn.linear_model.SGDRegressor()

2.6 Case: Boston house price prediction

learning target

  • Master the use of normal equations and gradient descent method API through cases

1 Case background introduction

  • Data introduction

[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-3WtmOfng-1664347182808) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1ga8u37zooxj317g0tc7dk.jpg)]

[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-BVKWh710-1664347182811) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1ga8u39xrmlj30xo0ryk16.jpg)]

These given characteristics are the resulting attributes that experts conclude influence housing prices. We do not need to explore whether the features are useful at this stage, we only need to use these features. To quantify many features later, we need to find them ourselves.

2 Case analysis

If the data size in the regression is inconsistent, will it cause a greater impact on the results? Therefore, standardization is required.

  • Data segmentation and standardization
  • regression prediction
  • Evaluation of Algorithm Effects of Linear Regression

3 Regression performance evaluation

Mean Squared Error (MSE) evaluation mechanism:

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-mbDSHy04-1664347182812) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1ga8u3b3z4oj30lu04eq3b.jpg)]

Note: yi is the predicted value, y − y^-y is the true value.

Thinking: What is the difference between MSE and least squares method?

  • sklearn.metrics.mean_squared_error(y_true, y_pred)
    • mean square error regression loss
    • y_true: true value
    • y_pred: predicted value
    • return: floating point result

4 code implementation

4.1 Normal equation

from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
def linear_model1():
    """
    线性回归:正规方程
    :return:None
    """
    # 1.获取数据
    data = load_boston()

    # 2.数据集划分
    x_train, x_test, y_train, y_test = train_test_split(data.data, data.target, random_state=22)

    # 3.特征工程-标准化
    transfer = StandardScaler()
    x_train = transfer.fit_transform(x_train)
    x_test = transfer.transform(x_test)

    # 4.机器学习-线性回归(正规方程)
    estimator = LinearRegression()
    estimator.fit(x_train, y_train)

    # 5.模型评估
    # 5.1 获取系数等值
    y_predict = estimator.predict(x_test)
    print("预测值为:\n", y_predict)
    print("模型中的系数为:\n", estimator.coef_)
    print("模型中的偏置为:\n", estimator.intercept_)

    # 5.2 评价
    # 均方误差
    error = mean_squared_error(y_test, y_predict)
    print("误差为:\n", error)

    return None
linear_model1()

4.2 Gradient descent method

from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import SGDRegressor
def linear_model2():
    """
    线性回归:梯度下降法
    :return:None
    """
    # 1.获取数据
    data = load_boston()

    # 2.数据集划分
    x_train, x_test, y_train, y_test = train_test_split(data.data, data.target, random_state=22)

    # 3.特征工程-标准化
    transfer = StandardScaler()
    x_train = transfer.fit_transform(x_train)
    x_test = transfer.transform(x_test)

    # 4.机器学习-线性回归(特征方程)
    estimator = SGDRegressor(max_iter=1000)
    estimator.fit(x_train, y_train)

    # 5.模型评估
    # 5.1 获取系数等值
    y_predict = estimator.predict(x_test)
    print("预测值为:\n", y_predict)
    print("模型中的系数为:\n", estimator.coef_)
    print("模型中的偏置为:\n", estimator.intercept_)

    # 5.2 评价
    # 均方误差
    error = mean_squared_error(y_test, y_predict)
    print("误差为:\n", error)

    return None
linear_model2()

We can also try to modify the learning rate

estimator = SGDRegressor(max_iter=1000,eta0=0.1)

At this time, we can find a value with better learning rate effect by adjusting parameters.

5 Summary

  • The use of normal equations and gradient descent method API in real cases [Know]
  • Linear Regression Performance Evaluation [Know]
    • mean square error

2.7 Underfitting and overfitting

learning target

  • Master the concepts of overfitting and underfitting
  • Understand the causes of overfitting and underfitting
  • Know what regularization is and its classification

1 Definition

  • Overfitting: A hypothesis can obtain a better fit than other hypotheses on the training data, but cannot fit the data well on the test data set . At this time, this hypothesis is considered to have overfitting. (The model is too complex)
  • Underfitting: A hypothesis cannot obtain a better fit on the training data and cannot fit the data well on the test data set . At this time, the hypothesis is considered to be underfitting. (The model is too simple)

[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-wferCgc3-1664347182814) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1ga8u2rlw69j315m0oc40y.jpg)]

So what makes the model complex? When linear regression is trained and learned, it becomes more complicated to turn it into a model. This corresponds to the two relationships of linear regression mentioned earlier, and the data of non-linear relationships, that is, there are many useless features or characteristics of things in reality and target values. The relationship is not a simple linear relationship.

2 reasons and solutions

  • Causes and solutions to underfitting
    • Reason: Too few features of the data were learned
    • Solution:
      • **1) Add other feature items. **Sometimes when our model is underfitted, it is caused by insufficient feature items. Other feature items can be added to solve it well. For example, "combination", "generalization", and "correlation" are three types of features that are important means of adding features. No matter what the scene is, you can copy the gourd and always get unexpected results. In addition to the above features, "context features", "platform features", etc. can be used as preferences for feature addition.
      • 2) Add polynomial features , which are commonly used in machine learning algorithms. For example, adding quadratic or cubic terms to a linear model makes the model have stronger generalization capabilities.
  • Causes and solutions to overfitting
    • Reason: There are too many original features and some noisy features. The model is too complex because the model tries to take into account each test data point.
    • Solution:
      • 1) Re-clean the data. One reason for over-fitting may also be caused by impure data. If over-fitting occurs, we need to re-clean the data.
      • 2) Increase the training amount of data. Another reason is that the amount of data we use for training is too small, and the proportion of training data to the total data is too small.
      • 3) Regularization
      • 4) Reduce feature dimensions and prevent dimension disaster

3 Regularization

3.1 What is regularization

In solving regression overfitting, we choose regularization. But this problem also occurs for other machine learning algorithms such as classification algorithms. In addition to the functions of some algorithms themselves (decision trees, neural networks), we also do more feature selection ourselves, including the previously mentioned deletion, merge some features

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-pRIxWpy7-1664347182815) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1ga8u2sjcw9j314o0g8wkd.jpg)]

How to solve?

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-1meqlPYh-1664347182815) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1ga8u2tduvuj30zs0kctav.jpg)]

During learning, some of the features provided by the data affect the model complexity or there are too many data points for this feature. Therefore, the algorithm tries to reduce the impact of this feature (or even delete the impact of a certain feature) when learning. This is regularization.

Note: During adjustment, the algorithm does not know the impact of a certain feature, but adjusts parameters to obtain optimized results.

3.2 Regularization categories

  • L2 regularization
    • Function: It can make some of the W's very small and close to 0, weakening the influence of a certain feature.
    • Advantages: The smaller the parameters, the simpler the model, and the simpler the model, the less likely it is to produce overfitting.
    • Ridge returns
  • L1 regularization
    • Function: You can make some of the W values ​​directly to 0, and delete the influence of this feature.
    • LASSO returns

4 Summary

  • Underfitting [Master]
    • Perform poorly on the training set and perform poorly on the test set
    • Solution:
      • Continue studying
        • 1. Add other features
        • 2. Add polynomial features
  • Overfitting【Master】
    • Perform well on the training set, perform poorly on the test set
    • Solution:
      • 1. Re-clean the data set
      • 2. Increase the amount of training data
      • 3. Regularization
      • 4. Reduce feature dimensions
  • Regularization【Mastery】
    • Prevent overfitting by limiting the coefficients of higher-order terms
    • L1 regularization
      • Understanding: directly change the coefficient in front of the higher-order term to 0
      • Lasso returns
    • L2 regularization
      • Understanding: Change the coefficient in front of the higher-order term into a particularly small value
      • ridge regression

2.8 Regularized linear model

learning target

  • Know the linear model of regularized ridge regression
  • Know the linear model of lasso regression in regularization
  • Know the linear model of elastic networks in regularization
  • Understanding linear models with early stopping in regularization

  • Ridge Regression Ridge Regression
  • Lasso returns
  • Elastic Net elastic network
  • Early stopping

1 Ridge Regression

Ridge regression is a regularized version of linear regression, that is, adding a regularization term to the cost function of the original linear regression:

[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-E7j808lW-1664347182816) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1ga8u2us8hjj30oc01yglh.jpg)]

In order to achieve the purpose of making the model weight as small as possible while fitting the data, the ridge regression cost function is:

Insert image description here

  • α=0: Ridge regression degenerates into linear regression

2 Lasso Regression(Lasso regression)

Lasso regression is another regularized version of linear regression. The regular term is the ℓ1 norm of the weight vector.

Cost function of Lasso regression:

[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-joUY6Oxp-1664347182818) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1ga8u2wk7zxj30zo03y3yk.jpg)]

【Notice】

  • The cost function of Lasso Regression is non-differentiable at θi=0.
  • Solution: Use a subgradient vector to replace the gradient at θi=0, as follows:
  • Subgradient vector of Lasso Regression

[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-AVUVUgom-1664347182818) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1ga8u2y1mmnj313s0acac7.jpg)]

A very important property of Lasso Regression is that it tends to completely eliminate unimportant weights.

For example: when the value of α is relatively large, the high-order polynomial degenerates into quadratic or even linear: the weight of the high-order polynomial feature is set to 0.

In other words, Lasso Regression can automatically perform feature selection and output a sparse model (only a few features have non-zero weights).

3 Elastic Net (elastic network)

The elastic network is a compromise between ridge regression and Lasso regression, controlled by the mix ratio r :

  • r=0: elastic network becomes ridge regression
  • r=1: The elastic network is Lasso regression

Cost function of elastic network:

[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-vYkLWEdB-1664347182820) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1ga8u2yxt7uj311q03iaa8.jpg)]

Generally speaking, we should avoid using naive linear regression and perform certain regularization on the model. So how to choose a regularization method?

summary:

  • Commonly used: Ridge regression

  • Assume that only a small number of features are useful:

    • elastic network
    • Lasso
    • In general, elastic networks are more widely used. Because when the feature dimension is higher than the number of training samples, or the features are strongly correlated, the performance of Lasso regression is unstable.
  • api:

    • from sklearn.linear_model import Ridge, ElasticNet, Lasso
      

4 Early Stopping [Understand]

Early Stopping is also one of the methods of regularized iterative learning.

The approach is to stop training when the verification error rate reaches the minimum value.

5 Summary

  • Ridge Regression Ridge Regression
    • Just add the square term to the coefficient
    • Then limit the size of the coefficient value
    • The smaller the value of α, the larger the value of the coefficient. The larger the value of α, the smaller the value of the coefficient.
  • Lasso returns
    • Perform absolute value processing on coefficient values
    • Since the absolute value is not differentiable at the vertex, many 0s are generated during the calculation, and the final result is: sparse matrix
  • Elastic Net elastic network
    • It is a combination of the first two contents
    • An r is set, if r=0 – Ridge regression; r = 1 – Lasso regression
  • Early stopping
    • Stop by limiting the error rate threshold

2.9 Improvement of linear regression-ridge regression

learning target

  • Know the specific use of ridge regression api

1 API

  • sklearn.linear_model.Ridge(alpha=1.0, fit_intercept=True,solver=“auto”, normalize=False)
    • Linear regression with l2 regularization
    • alpha: regularization strength, also called λ
      • λ value: 0~1 1~10
    • solver: will automatically select the optimization method based on the data
      • sag: If the data set and features are relatively large, choose the stochastic gradient descent optimization
    • Ridge.coef_: regression weights
    • Ridge.intercept_: regression bias

2. Observe the changes in the degree of regularization and its impact on the results?

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-DEVBP9z7-1664347182820) (F:\Machine Learning and Data Mining\Lesson 13_September 22\ 2. Linear regression\2.1 Introduction to linear regression.assets\006tNbRwly1ga8u2ohzhhj31220s00yp.jpg)]

  • The stronger the regularization, the smaller the weight coefficient will be.
  • The smaller the regularization strength, the larger the weight coefficient will be.

3 Boston House Price Forecast

from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import Ridge
def linear_model3():
    """
    线性回归:岭回归
    :return:
    """
    # 1.获取数据
    data = load_boston()

    # 2.数据集划分
    x_train, x_test, y_train, y_test = train_test_split(data.data, data.target, random_state=22)

    # 3.特征工程-标准化
    transfer = StandardScaler()
    x_train = transfer.fit_transform(x_train)
    x_test = transfer.fit_transform(x_test)

    # 4.机器学习-线性回归(岭回归)
    estimator = Ridge(alpha=1)
    estimator.fit(x_train, y_train)

    # 5.模型评估
    # 5.1 获取系数等值
    y_predict = estimator.predict(x_test)
    print("预测值为:\n", y_predict)
    print("模型中的系数为:\n", estimator.coef_)
    print("模型中的偏置为:\n", estimator.intercept_)

    # 5.2 评价
    # 均方误差
    error = mean_squared_error(y_test, y_predict)
    print("误差为:\n", error)
linear_model3()

4 Summary

  • sklearn.linear_model.Ridge(alpha=1.0, fit_intercept=True,solver=“auto”)【知道】
    • Linear regression with l2 regularization
    • alpha – regularization
      • The stronger the regularization, the smaller the weight coefficient will be.
      • The smaller the regularization strength, the larger the weight coefficient will be.

2.10 Polynomial Regression and Pipeline

learning target

  • Know the specific use of polynomial regression API

1 API

  • sklearn.preprocessing.PolynomialFeatures(degree=2)

    • Generate polynomial features
    • Degree: Generate polynomial features that are not less than the power of degree. For example, given two features [a, b] and degree=2, it will generate [1, a, b, ab, a 2 a^2a2, b 2 b^2 b2
  • sklearn.pipeline.Pipeline(steps=

    [(‘scaler’,StandardScaler()),
    (‘lin_reg’, LinearRegression())]

    )

    • assembly line work
    • steps:List of (name, transform) tuples

2.11 Saving and loading models

learning target

  • Know how to save and load models in sklearn

1 sklearn model saving and loading API

  • from sklearn.externals import joblib
    • Save: joblib.dump(model, path)
    • Loading: estimator = joblib.load(path)

2 Linear regression model saving and loading case

from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import Ridge
import joblib
def load_dump():
    """
    模型保存和加载
    :return:
    """
    # 1.获取数据
    data = load_boston()

    # 2.数据集划分
    x_train, x_test, y_train, y_test = train_test_split(data.data, data.target, random_state=22)

    # 3.特征工程-标准化
    transfer = StandardScaler()
    x_train = transfer.fit_transform(x_train)
    x_test = transfer.fit_transform(x_test)

    #4.机器学习-线性回归(岭回归)
    # 4.1 模型训练
    estimator = Ridge(alpha=1)
    estimator.fit(x_train, y_train)
    
    # 4.2 模型保存
    joblib.dump(estimator, "./data/test.pkl")

#     # 4.3 模型加载
#     estimator = joblib.load("./data/test.pkl")

    # 5.模型评估
    # 5.1 获取系数等值
    y_predict = estimator.predict(x_test)
    print("预测值为:\n", y_predict)
    print("模型中的系数为:\n", estimator.coef_)
    print("模型中的偏置为:\n", estimator.intercept_)

    # 5.2 评价
    # 均方误差
    error = mean_squared_error(y_test, y_predict)
    print("误差为:\n", error)
load_dump()

3 Summary

  • sklearn.externals import joblib【know】
    • Save: joblib.dump(estimator, 'test.pkl')
    • Loading: estimator = joblib.load('test.pkl')
    • Notice:
      • 1. Save the file with the suffix **.pkl
        2. Loading the model requires a variable.

Guess you like

Origin blog.csdn.net/weixin_52733693/article/details/127091657