Python Machine Learning and Practice 2: Basics 2

Supervised Learning Classic Model

The task of supervised learning models in machine learning is to predict the labels of unknown samples based on existing empirical knowledge. According to the different types of target predictors, we roughly divide supervised learning tasks into two categories: classification learning and regression prediction.

We talked about classification learning earlier, and now we briefly introduce regression prediction.

regression prediction

The difference between a regression problem and a classification problem is that the target to be predicted is a continuous variable, such as price, precipitation, and so on.

Since I don’t use much regression prediction, I don’t intend to go into too much detail. In fact, I personally think that regression and classification ideas are similar, but their uses are different.

So briefly: regression prediction consists of:

Linear regressor

Model introduction: In the "Linear Classifier" section, the linear model used for classification is mainly introduced. In order to facilitate the mapping of the original calculation results in the real number domain to the (0, 1) interval, the logistic function is introduced. .In the linear regression problem, since the prediction target is directly a value in the real number field, the optimization target is simpler, that is, to minimize the difference between the prediction result and the real value. Fit the discrete data as much as possible with a linear function they.

Support Vector Machines (Regression)

· Model introduction: Presumably readers and friends have some understanding of the mechanism of action of the classification model mentioned in support vector machine (classification). The support vector machine (regression) introduced in this section also selects a part of the more effective support vectors from the training data, but the small part of the training samples provide not the category target, but the specific prediction value.

K-nearest neighbors (regression)

Model introduction: In the K nearest neighbors (classification), it is mentioned that this type of model does not require training parameters, and it is divided into two categories: inside k and outside k. In the regression task, the K-nearest neighbor (regression) model also only uses the target values ​​of the K nearest training samples around to make decisions on the regression value of the sample to be tested . Naturally, different ways of measuring the regression value of the sample to be tested are also derived, that is, whether to use the ordinary arithmetic average algorithm for the K nearest neighbor target values, or to consider the difference in distance for weighted average . Therefore, this section also initializes K-nearest neighbor (regression) models with different configurations to compare the difference in regression performance.

regression tree

Model introduction: The regression tree is similar to the decision tree in the strategy of selecting different features as split nodes . The difference is that the data type of the regression leaf node is not discrete, but continuous. Each leaf node of the decision tree determines its final prediction category according to the probabilistic tendency of the training data; while the leaf nodes of the regression tree are specific values. Strictly speaking, in the sense that the predicted values ​​are continuous, regression trees cannot be called for the "regression algorithm". Because the leaf nodes of the regression tree return the mean of the "clump" of training data, rather than a specific, continuous predicted value.

Ensemble Model (Regression)

Model introduction: In the section Ensemble Models (Classification), the general types and advantages of ensemble models have been discussed. In addition to continuing to use the regressor versions of ordinary random forest and boosted tree models, this section will also introduce random Another variant of the forest model: Extremely Randomized Trees 0 Unlike the ordinary Random Forests model, extreme random forests do not arbitrarily select the split nodes of a tree whenever they are constructed. Instead, a part of the features is randomly collected, and then the best node features are selected by indicators such as information entropy and Gini Impurity.


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325942843&siteId=291194637