Data mining: model selection - supervised learning (return)

Data mining: model selection - supervised learning (return)

In the article Data Mining: model selection - supervised learning (classification) introduces the principle of classification related algorithms, this time to introduce some algorithms regression. The main target return is predicted continuous data.
This article comes from: Machine Learning Cai Cai classroom

A linear regression

Brief :
the short answer, such as linear regression y = ax + b, which is the case when there is only one feature, and for a plurality of features, it becomes a multiple linear regression. Formula is as follows:
Here Insert Picture Description
This corresponds to a weight value θ is the independent variable, the greater the influence proved more important features of the results. Whereby screening feature. And we asking too is this parameter.

Works :

General problems of machine learning experience, first of all define the loss function. By optimizing for the loss of function parameters to solve.
Here a residual sum of squares of the real and predicted values as to optimize and loss function, i.e. when θ what value, make it to a minimum. Least squares method
Here Insert Picture Description
There are two methods for solving matrix solver using the gradient descent method.
Matrix solver using:
Here Insert Picture Description
Here Insert Picture Description
gradient descent method:
Here Insert Picture Description
Here Insert Picture Description

The main parameters :
no parameters can be adjusted. So this model can not be optimized by adjusting the parameters of the model, which only rely on data. In addition, LR to do the normalization / standardization.

Algorithm features :
Advantages: easy to understand a result, no computationally complex.
Disadvantages: non-linear fit to the data is not good. Easy to over-fitting, because the optimal solution is sought.
Since the linear regression assuming a linear relationship between features and satisfying results .
sklearn Code :

from sklearn.linear_model import LinearRegression as LR

reg = LR().fit(Xtrain,Ytrain)
yhat = reg.predict(Xtest)

Second, the ridge regression and Lasso return

Ridge regression :
a brief introduction :
In multiple linear solver derivation, there is an inverse matrix, and if the inverse matrix of data does not exist (multicollinearity between data, completely linear dependence or high linear correlation ), then the prediction results and not very good. Will find a lot of θ resulting models are biased, or not available.

Multicollinearity available the VIF (variance inflation factor measured). In addition, in order to solve the problem of multicollinearity linear regression, ridge regression and Lasso return available.

Works :
plus regular items on the ridge regression and loss function Lasso regression is linear regression. Plus the ridge regression coefficients L2 norm, Lasso coefficients plus the L1 norm.
Ridge Regression :
Ridge Regression loss function:
Here Insert Picture Description
Here Insert Picture Description
after a plus, not opposite number, can not guarantee a long on the original matrix 0 or matrix is invertible matrix diagonal. setting the value of a human.
Here Insert Picture Description
Here Insert Picture Description

The main parameters :
regularization factor a. As a small, then the original equation does not work, the non-effective ω. If a large, then make the regression coefficients shifted (a weakening of the information originally represented ω).

By adjusting the alpha values, if there is no significant change in the model, then there is no multicollinearity data. If the rise, indicates the presence of multicollinearity data. alpha value, adjusted by the variance of the deviation model. Mainly through cross-validation selection. Here a ridge regression with cross-validation.
Here Insert Picture Description


Lasso regression :
a brief introduction :
Lasso loss function:
Here Insert Picture Description

Derivative of the loss function.
Here Insert Picture Description
Then the alpha not affect whether a matrix is reversible. lasso can not solve the multicollinearity problem.
But in practical problems, neither completely relevant data, there is no completely independent. So, assuming that the inverse matrix has always been there.
Here Insert Picture Description
limit the size of the parameter alpha can be prevented as a result of Multicollinearity w is overestimated, resulting in inaccurate model. Lasso can not solve the multicollinearity, but it can limit its impact. alpha factor w fit of the compression of the original data, it can be done lasso selection feature.

L1 and L2 regularization a core difference is their impact on the coefficients: two regular size will compressibility factor of the coefficients of the contribution of the label fewer features would be smaller, it will be more easily compressed.
However, L2 regularization only be as close as possible to the compression coefficient 0, but L1 sparsity regularization dominant, and therefore will be compressed to 0 coefficient. This property, let Lasso became the linear model feature selection tool of choice.

Select the alpha value in Lasso, through cross-validation. LassoCV.
Eps with n_alphas. eps taken is small.

sklearn Code :

from sklearn.linear_model import Ridge,Lasso
 
reg = Ridge(alpha=1).fit(Xtrain,Ytrain)
reg = Lasso(alpha=1).fit(Xtrain,Ytrain)

Third, polynomial regression

Linear and nonlinear data data :
linear relationship :
"linear" is the word used to mean different things to describe different things. Linear our most commonly used means "variable linear relationship between the amount (linear relationship)", which represents the relationship between two variables can be displayed as a straight line, i.e., can be used to fit equation. General view by drawing a scatter plot .
Here Insert Picture Description
Linear Data :
Typically, a set of features and a plurality of data tags. When these features are linear relationship with the label, this set of data is said to be the linear data. And the relationship between the label and any of the features required trigonometric, exponential, etc. is defined, called non-linear data.
For linear and nonlinear data, the easiest identification method is to use the model to help us - if it is done classification using logistic regression, if you do regression using linear regression, if the good effect that the data is linear, the effect is not good data is not linear. Of course, the dimension reduction may be performed after the drawing, draw a straight line close to the image distribution, then the data is linear.
Linear diagram, is to explore the relationship between features and target variable.
Regression:
Here Insert Picture Description
classification problem
when we conduct classification of our decision-making is often a function piecewise function, such as the decision function can be classified under two symbolic function, image sign function can be expressed as a value of 1 and -1 the two lines. This obviously does not meet our earlier function can be used to represent a line attribute , so classification tag feature [0,1], or the relationship between [-1,1] strongly non-linear relationship . Unless we fit in the probability of classification, otherwise there is no exception.
Here Insert Picture Description
This is a data map, the ordinate is the horizontal feature. Where color is the target variable. Care not fit, but if we can find a line, separate them.
Here Insert Picture Description
In summary, for regression problems, if the data distribution is a straight line, it is linear, otherwise it is non-linear. For classification problems, if data distribution using a straight line to divide the category, it is linearly separable, otherwise the data is linearly inseparable.


Linear models and nonlinear models :
linear regression model we have established, is a linear model for linear data.
Here Insert Picture Description
As a typical representative of the linear model, we can summarize the characteristics of linear model from linear regression equation of: its arguments are once items.

Linear Nonlinear model fitting data :
Linear data simpler nonlinear models, R2 easy to train on the training set very high, very low training the MSE. Effective, easy to over-fitting.
Here Insert Picture Description
Non-linear data fit a linear model :
simple linear model nonlinear data is not good. Available linear model data after nonlinear processing bins.
Here Insert Picture Description
Are linear decision boundary model is a straight line parallel, and their decision nonlinear model edge
boundary is a straight line interactive (grid), curves like annular
. For the classification model, this model is our judgment is an important factor in judging a linear or non-linear: linear model of decision boundary is a straight line parallel, non-linear model of decision boundary curve is a straight line or a cross. For classification, the highest decision-making power of the independent variable on the border if a classification model is 1, then we call this model is a linear model.

Both linear and also non-linear model

Such as SVM, the SVM is no nuclear linear model, a nonlinear model is nuclear.
Here Insert Picture Description
The main decision-making seen in two-dimensional space boundary and fit graphics are not straight.
Here Insert Picture Description


Discretization :
using binning discrete continuous variables way to process the raw data in order to enhance the performance of the linear regression.

from sklearn.preprocessing import KBinsDiscretizer

#将数据分箱
enc = KBinsDiscretizer(n_bins=10 #分几类?
					,encode="onehot") 

Here Insert Picture Description
Further, the number of sub-box is fitted can affect performance. To select the optimal number of boxes using cross-validation. Another more general way the data is nonlinear polynomial regression. Mainly using SVM rise like Invensys want.


Polynomial regression :
by number (polynomial changes) increased from variable to achieve the effect of L-dimensional linear model.
Polynomial change the original data adjusted so that it can be a linear model fit.

from sklearn.preprocessing import PolynomialFeatures
import numpy as np

X = np.arange(1,4).reshape(-1,1)
poly = PolynomialFeatures(degree=2)
X_ = poly.fit_transform(X)
X_

The results obtained are as follows:
Here Insert Picture Description
In this case, then the linear regression fit. Linear regression is fit will each feature a heavy weight, as
to when we fit the high-dimensional data, we get the following model:
Here Insert Picture Description
while the computer is on, do not understand what is the intercept and coefficients. Therefore, x0 will be considered coefficient. So, generally do not seek intercept.
More than is the case for one-dimensional data, and two-dimensional case are as follows:
Here Insert Picture Description
Here Insert Picture Description
When we polynomial transformation, polynomial will yield the highest number of all higher-order terms until low . For example, if we specified polynomial is 2, it will output all polynomial number 1 and number 2 for the item back to us, if we set the appropriate degree of the polynomial is n, the polynomial will output all from the number 1 n is the number of the item.
Polynomial regression, we can specify whether a square or cube of fact, if we only ask higher-order terms, it would be better than a higher-order terms, because of collinearity between x1x2 and x1x1 and x1 x1 than between collinearity good a little bit (just a little), but after we polynomial transformation is required to use linear regression model to fit, even if the basic assumptions in machine learning is not so concerned about the data, but too much of the total linear or will affect the fit of the model. Thus sklearn exists in whether to generate the control parameters interaction_only square and cubic terms, the default is False, to reduce collinearity.

Polynomial regression equation by the dimensions of the original data and the number of times we set up the decision. Therefore, multiple
polynomial regression is no fixed model expression.

Polynomial regression nonlinear problem :

from sklearn.preprocessing import PolynomialFeatures as PF
from sklearn.linear_model import LinearRegression
import numpy as np

d=5
#进行高此项转换
poly = PF(degree=d)
X_ = poly.fit_transform(X)
line_ = PF(degree=d).fit_transform(line)
#训练数据的拟合
LinearR_ = LinearRegression().fit(X_, y)
LinearR_.score(X_,y)

Polynomial regression can better fit the nonlinear data, not prone to over-fitting, can be said to be retained as a linear regression model brought " not easy to over-fitting " and ** "** fast computing," the nature, while achieving excellent fit nonlinear data.

Explanatory polynomial regression may be :
When we polynomial conversion, although we still form a linear regression equation of the form, but with the rising number of data dimensions and polynomial equation becomes very complex, we may not be able to see a feature after feature increased dimension is what the previous composition. Before processing the feature example, the polynomial method used will be derived from a plurality of new features.
However, polynomial regression interpretability still exists, we can name the various features connected get_feature_names to call the new feature matrix generated using to help us to explain the model.
Effect linear model, although not as random forests, but it ran fast ......

import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
X = np.arange(9).reshape(3, 3)
poly = PolynomialFeatures(degree=5).fit(X)
#重要接口get_feature_names
poly.get_feature_names()#打印特征名称

Linear or non-linear model?
Nonlinear: original model
Here Insert Picture Description
after changing the feature name: linear
Here Insert Picture Description
understand here narrow linear model and generalized linear models .

Narrow linear model : from the high of this variable can not, can not exist in non-linear relationship between independent variables and labels. ( Independent variable is time )
Generalized Linear Models : as long as the relationship between the label and the fitted model parameters is linear model is linear. It is said that as long as no multiplied or divided between a series of w generates a relationship, we believe that the model is linear. ( Coefficient w is the time )
summarize, polynomial regression model is generally considered to be non-linear, but it is a special kind of generalized linear model , it can help us deal with nonlinear data, it is an evolution of linear regression.
Crazy polynomial changes increase the data dimension, but also increases the likelihood of over-fitting, and therefore change the polynomial can be treated with multiple fitting models such as linear ridge regression, Lasso and the like used in conjunction , and the effect of using a linear regression consistent.

IV Summary

Multiple linear regression , ridge regression , Lasso and polynomial regression total of four algorithms, they are all around the original linear regression expansion and improvements.
Which ridge regression and Lasso is to address the constraints to using multiple linear regression method of least squares, the main purpose is to eliminate the influence of co-linearity caused by multiple and doing feature selection.
Polynomial regression to solve the obvious shortcomings of linear regression can not fit nonlinear data, the core role is to enhance the performance of the model.
Because of this it also follows in learning, so this just took notes. There will be time later to sort again.

Published 26 original articles · won praise 29 · views 10000 +

Guess you like

Origin blog.csdn.net/AvenueCyy/article/details/104531239