[Machine Learning] Regression--Polinomial Regression Polynomial Regression

First of all, we need to clarify a concept. The linearity or nonlinearity we are discussing is aimed at the coefficient of the independent variable, not the independent variable itself, so no matter how the independent variable changes, if the coefficient of the independent variable is linear, we say it is linear. of. So here we can also describe polynomial linear regression.

figure 2.14

From this formula, we can see that there is only one independent variable, which is x, but the degree of x is different.

The data we use this time is the salary corresponding to different promotion levels within the company


Let's take a look at how it is implemented in Python

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
# 这里注意:1:2其实只有第一列,与1 的区别是这表示的是一个matrix矩阵,而非单一向量。
y = dataset.iloc[:, 2].values

Next, enter the topic and start polynomial linear regression:

from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 1) #degree 就是自变量需要的维度
X_poly = poly_reg.fit_transform(X)
lin_reg_2 = LinearRegression()
lin_reg_2.fit(X_poly, y)

In this process, we set a one-time independent variable: degree=1 means that the independent variable is only once, which is equivalent to simple linear regression
. Let's express it in the image:

# 图像中显示
plt.scatter(X, y, color = 'red')
plt.plot(X, lin_reg_2.predict(poly_reg.fit_transform(X)), color = 'blue')
plt.title('Truth or Bluff (Polynomial Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

figure 2.17

This image is the same as the image represented by simple linear regression

# 简单线性回归 图像中显示
plt.scatter(X, y, color = 'red')
plt.plot(X, lin_reg.predict(X), color = 'blue')
plt.title('Truth or Bluff (Linear Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

figure 2.18

Let's try to change the dimension, set the degree to 2, and leave the other unchanged. Execute the code to see the image:

figure 2.19

We can find that the whole trend fits the distribution of the data.

Let's change the degree to 3 and 4 and see the result

figure 2.20

figure 2.21

We can find that when degree=4, it basically conforms to the distribution of all points

We smoothen the image a bit by splitting the abscissa:

X_grid = np.arange(min(X), max(X), 0.1)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X, y, color = 'red')
plt.plot(X_grid, lin_reg_2.predict(poly_reg.fit_transform(X_grid)), color = 'blue')
plt.title('Truth or Bluff (Polynomial Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

figure 2.22

Below we give a test value to try the result (6,10)

lin_reg_2.predict(poly_reg.fit_transform(6))   
lin_reg_2.predict(poly_reg.fit_transform(10))

figure 2.23

It is relatively close to the actual value.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326396180&siteId=291194637