Machine Learning---Regression Code

1. Drawing function

%matplotlib inline
import matplotlib.pyplot as plt
def runplt():
    plt.figure()
    plt.title('diameter-cost curver')
    plt.xlabel('diameter')
    plt.ylabel('cost')
    plt.axis([0, 25, 0, 25])
    plt.grid(True)
    return plt

plt = runplt()
X = [[6], [8], [10], [14], [18]]
y = [[7], [9], [13], [17.5], [18]]
plt.plot(X, y, 'k.')
plt.show()

2. Predict prices

from sklearn.linear_model import LinearRegression
import numpy as np
# 创建并拟合模型
model = LinearRegression()
model.fit(X, y)
print('预测12英寸匹萨价格:$%.2f' % model.predict(np.array([12]).reshape(-1, 1))[0])

        model = LinearRegression(): Initialize an instance of a linear regression model.

        This model can then be used to:

Training model: Use the .fit() method to pass in feature data X and label data y

Prediction: Use the .predict() method to pass in new feature data and obtain prediction results.

View model parameters (weights): through .coef_ and .intercept_ properties

           fit() Methods are used to train a model  using given input features  X and corresponding target values  . yInput features  X should

Is a two-dimensional array object, such as a NumPy or a DataFrame, in which each row represents a sample and each column

represents a characteristic. The target value  y should be a one-dimensional array object, such as a NumPy or a Pandas Series,

Contains the target value corresponding to each sample.

    reshape(-1, 1) The purpose is to convert a one-dimensional array or vector into a column vector. Parameters  -1 represent arrays

The size of the dimension is automatically calculated, and  1 the representation of the resulting array should be a column vector. np.array([12]).reshape(-1, 1) create

Created a NumPy array containing a single value 12 and reshaped it into a column vector. Then, via  predict() the caller

Method and pass in this reshaped input feature to predict it.  The method returns the prediction result, retrieved predict() by index [0]

Get the first element of the predicted result. So the purpose of the entire code is to use the model to predict the input feature value 12 and return the predicted

The first element of the test result.

plt = runplt()
plt.plot(X, y, 'k.')
X2 = [[0], [10], [14], [25]]
model = LinearRegression()
model.fit(X, y)
y2 = model.predict(X2)
plt.plot(X, y, 'k.')
plt.plot(X2, y2, 'g-')
plt.show()

plt = runplt()
plt.plot(X, y, 'k.')
X2 = [[0], [10], [14], [25]]
model = LinearRegression()
model.fit(X, y)
y2 = model.predict(X2)

plt.plot(X2, y2, 'g-')

# 残差预测值
yr = model.predict(X)
for idx, x in enumerate(X):
    plt.plot([x, x], [y[idx], yr[idx]], 'r-')

plt.show()

print('残差平方和: %.2f' % np.mean((model.predict(X) - y) ** 2))
# 残差平方和: 1.75

3. Linear regression 

from sklearn.linear_model import LinearRegression
X = [[6, 2], [8, 1], [10, 0], [14, 2], [18, 0]]
y = [[7], [9], [13], [17.5], [18]]
model = LinearRegression()
model.fit(X, y)
X_test = [[8, 2], [9, 0], [11, 2], [16, 2], [12, 0]]
y_test = [[11], [8.5], [15], [18], [11]]
predictions = model.predict(X_test)
for i, prediction in enumerate(predictions):
    print('Predicted: %s, Target: %s' % (prediction, y_test[i]))
print('R-squared: %.2f' % model.score(X_test, y_test))

    enumerate() is a built-in function used to obtain both the index and value of an element during iteration.

enumerate(iterable) It returns an iterator object that produces tuples when  you use (index,

value), where  index is the index of the element during the iteration process and value is the corresponding element value.

It  enumerate() is convenient to obtain the index and value at the same time in a loop, especially suitable for situations where you need to track the position of an element.

For example, when processing iterable objects such as lists and strings.

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
X_train = [[6], [8], [10], [14], [18]]
y_train = [[7], [9], [13], [17.5], [18]]
X_test = [[6], [8], [11], [16]]
y_test = [[8], [12], [15], [18]]
# 建立线性回归,并用训练的模型绘图
regressor = LinearRegression()
regressor.fit(X_train, y_train)
xx = np.linspace(0, 26, 100)
yy = regressor.predict(xx.reshape(xx.shape[0], 1))
plt = runplt()
plt.plot(X_train, y_train, 'k.')
plt.plot(xx, yy)

quadratic_featurizer = PolynomialFeatures(degree=2)
X_train_quadratic = quadratic_featurizer.fit_transform(X_train)
X_test_quadratic = quadratic_featurizer.transform(X_test)
regressor_quadratic = LinearRegression()
regressor_quadratic.fit(X_train_quadratic, y_train)
xx_quadratic = quadratic_featurizer.transform(xx.reshape(xx.shape[0], 1))
plt.plot(xx, regressor_quadratic.predict(xx_quadratic), 'r-')
plt.show()
print(X_train)
print(X_train_quadratic)
print(X_test)
print(X_test_quadratic)
print('1 r-squared', regressor.score(X_test, y_test))
print('2 r-squared', regressor_quadratic.score(X_test_quadratic, y_test))

 

    linspace(0, 26, 100) Generates an array containing 100 numbers starting from 0 and ending at 26

bundles and evenly distributed within this range. 

    reshape(xx.shape[0], 1) The purpose is to  xx reshape the array into a column vector. xx is a one-dimensional array, through

By calling  reshape() the method and passing in the parameters  (xx.shape[0], 1), it is converted into a two-dimensional array, where

There are  xx.shape[0] rows and 1 column. xx.shape[0] Represents  xx the length of the array, that is, the number of elements. By reshaping the array

For  (xx.shape[0], 1) the shape of , we convert it into a column vector where each element is in a separate row.

    quadratic_featurizer = PolynomialFeatures(degree=2) is used to generate quadratic polynomial features

Instantiation operation of the PolynomialFeatures class. PolynomialFeatures is a function in the Scikit-learn library for

Generate polynomial features. By specifying the degree parameter, you can control the maximum degree of the generated polynomial. In this example, pass

Setting the degree parameter to 2 creates an instance of quadratic_featurizer, a quadratic polynomial feature generator. Generated

Quadratic polynomial features can be used for training regression or classification models. For example, these features can be fed into a linear regression model

, to fit a quadratic curve. Next, you can use quadratic_featurizer to transform the data and convert the original features into

Characteristics are converted into quadratic polynomial characteristics. The method can be used  fit_transform() to perform fitting and transformation at the same time to generate new characteristic moments.

Array.

         When using the PolynomialFeatures class to perform quadratic polynomial feature transformation on the features of the training set and test set, first,

By calling  quadratic_featurizer.fit_transform(X_train), fit the feature X_train of the training set and

Transform, generating a new feature matrix X_train_quadratic containing the original features and their quadratic interaction terms. This new feature matrix will

used to train the model.

          Then,  quadratic_featurizer.transform(X_test)perform the test set feature X_test by calling

Transform, generating a new feature matrix X_test_quadratic with the same shape as the training set feature matrix. This new feature matrix

Will be used to make predictions or evaluations on the trained model.

fit_transform() The purpose of fitting and transforming on the training set           using  methods is to ensure that the new feature matrix generated contains

All possible feature combinations in the training set. Then, use  transform() the method to transform on the test set to maintain the characteristics of

consistency. The purpose of this is to use the same feature transformation method during the training and testing phases to ensure that the model has the ability to process new data.

have the same characteristics. This helps avoid the problem of inconsistent feature representation of the model between training and test data.

          By calling  score(X_test, y_test) the method, you can calculate the fit metric score of the regression model on the test set. this

A score is often used to evaluate model performance and prediction accuracy. How the exact score is calculated depends on the regression model used.

For most regression models, this score is the R-squared (coefficient of determination) value, which ranges from 0 to 1, indicating that the model is accurate for the target.

the explanatory power of the standard variable. The closer the value of R-squared is to 1, the stronger the model’s ability to explain the target variable and the better the fitting effect.

good. The closer the value is to 0, the weaker the model’s ability to explain the target variable and the poorer the fitting effect. R-squared is a common

fit measure score. It represents the ability of the model to explain the target variable. R-squared is calculated by comparing model predictions

and the variance of the target variable.

plt = runplt()
plt.plot(X_train, y_train, 'k.')

quadratic_featurizer = PolynomialFeatures(degree=2)
X_train_quadratic = quadratic_featurizer.fit_transform(X_train)
X_test_quadratic = quadratic_featurizer.transform(X_test)
regressor_quadratic = LinearRegression()
regressor_quadratic.fit(X_train_quadratic, y_train)
xx_quadratic = quadratic_featurizer.transform(xx.reshape(xx.shape[0], 1))
plt.plot(xx, regressor_quadratic.predict(xx_quadratic), 'r-')

cubic_featurizer = PolynomialFeatures(degree=3)
X_train_cubic = cubic_featurizer.fit_transform(X_train)
X_test_cubic = cubic_featurizer.transform(X_test)
regressor_cubic = LinearRegression()
regressor_cubic.fit(X_train_cubic, y_train)
xx_cubic = cubic_featurizer.transform(xx.reshape(xx.shape[0], 1))
plt.plot(xx, regressor_cubic.predict(xx_cubic))
plt.show()
print(X_train_cubic)
print(X_test_cubic)
print('2 r-squared', regressor_quadratic.score(X_test_quadratic, y_test))
print('3 r-squared', regressor_cubic.score(X_test_cubic, y_test))

plt = runplt()
plt.plot(X_train, y_train, 'k.')

quadratic_featurizer = PolynomialFeatures(degree=2)
X_train_quadratic = quadratic_featurizer.fit_transform(X_train)
X_test_quadratic = quadratic_featurizer.transform(X_test)
regressor_quadratic = LinearRegression()
regressor_quadratic.fit(X_train_quadratic, y_train)
xx_quadratic = quadratic_featurizer.transform(xx.reshape(xx.shape[0], 1))
plt.plot(xx, regressor_quadratic.predict(xx_quadratic), 'r-')

seventh_featurizer = PolynomialFeatures(degree=7)
X_train_seventh = seventh_featurizer.fit_transform(X_train)
X_test_seventh = seventh_featurizer.transform(X_test)
regressor_seventh = LinearRegression()
regressor_seventh.fit(X_train_seventh, y_train)
xx_seventh = seventh_featurizer.transform(xx.reshape(xx.shape[0], 1))
plt.plot(xx, regressor_seventh.predict(xx_seventh))
plt.show()
print('2 r-squared', regressor_quadratic.score(X_test_quadratic, y_test))
print('7 r-squared', regressor_seventh.score(X_test_seventh, y_test))

 

 

 

 

 

 

 

Guess you like

Origin blog.csdn.net/weixin_43961909/article/details/132135478