Classification Models for Machine Learning

overview

The machine learning classification model learns through the training set, and establishes an input space XXX to output spaceYYA map of Y (discrete values). According to the different output categories (labels), it can be divided into binary classification (Binary Classification), multi-classification (Multi-Class Classification), and multi-label classification (Multi-Label Classification). Commonly used classification algorithms include logistic regression, KNN, decision number, random forest, naive Bayes, etc. The specific classification models will be introduced in the following.

classification model

logistic regression

Logistic regression is a classification model derived on the basis of linear regression. Since the output of linear regression is a continuous value, and its range cannot be limited, direct use of linear regression cannot be used as a basis for classification. It is necessary to use a mapping function (such as the Sigmoid function) to map the continuous output value to (0,1), and this probability value can be used as the basis for the model to judge the classification result.

nearest neighbor classification

Nearest Neighbor Classification (KNN) is a commonly used data clustering method. Its core idea is to find the k nearest points that match in the feature space, and judge which category the current sample should belong to according to the votes of the k nearest points. In the KNN algorithm, the selected neighbors are objects that have been correctly classified. Commonly used distance functions include Manhattan distance, Euclidean distance and Minkowski distance.

Naive Bayes

The Naive Bayesian classifier is based on Bayes' law and the assumption of conditional independence between events. Bayes' law is as follows:
P ( B i ∣ A ) = P ( B i ) P ( A ∣ B i ) ∑ j = 1 n P ( B j ) P ( A ∣ B j ) P(B_i|A) = \frac{P(B_i)P(A|B_i)}{\sum_{j=1}^{ n}P(B_j)P(A|B_j)}P(BiA)=j=1nP(Bj)P(ABj)P(Bi)P(ABi)
where P ( ⋅ ) P(\cdot)P ( ) is the probability of event occurrence,P ( A ∣ B ) P(A|B)P ( A B ) is the probability that A occurs given that B occurs.
The assumption of mutual independence between events holds that even if these events depend on each other or depend on the existence of other events, the Naive Bayesian algorithm considers these events to be independent. Naive Bayes learns the joint probability distribution from input to output through the given training set, and then obtains the output that maximizes the posterior probability based on the learned model and input.

Support Vector Machines

Support Vector Machine (SVM) transforms the classification problem into the problem of finding the classification plane, maps the sample space to a high-dimensional feature space, and achieves classification by maximizing the distance between the classification boundary point and the classification plane. The basic idea of ​​SVM learning is to solve the separation hyperplane that can correctly divide the training data set and has the largest geometric interval. For linearly separable data sets, there are infinitely many hyperplanes that divide different categories, but the separating hyperplane that maximizes the geometric interval is unique.
SVM uses the kernel function to map data from low-dimensional space to high-dimensional space, which can reduce the amount of calculation very well, and after projecting the data to high-dimensional space, the data may become separable, avoiding " The curse of dimensionality".

decision tree

A decision tree is a class of classification models built with a tree structure. Starting from the root node, the algorithm continuously divides the data set into smaller subsets through certain conditions to divide the data set, and finally develops into a tree with decision nodes (including root nodes and internal nodes) and leaf nodes. As the depth of the tree increases, the subset of branch nodes becomes smaller and smaller, and the judgment conditions are gradually simplified. When the depth or judgment condition of a branch node meets a certain stopping rule, the branch node will stop splitting. This is a top-down threshold termination (Cutoff Threshold) method; in addition, there is also bottom-up pruning (Pruning) Law. In the classification prediction, the input data passes through each decision node inside the decision tree, and enters different branches according to different attribute values ​​until it reaches the leaf node to complete the classification.

random forest

Random forest refers to a model composed of multiple decision trees, and there is no connection between different decision trees. When a classification task is performed, new input samples enter and will be input into different decision trees in the random forest, and multiple classification results are obtained according to the judgment conditions of all decision trees, and finally the final classification result is determined by a minority voting method .

multilayer perceptron

Multilayer Perception (MLP) is an artificial neural network based on forward propagation, which imitates human sensory neurons to propagate signals down layer by layer. The basic structure of a multilayer perceptron generally consists of three layers: an input layer, a hidden layer, and an output layer. During training, use a backpropagation algorithm (such as the gradient descent method) to adjust the weights to reduce the deviation during the training process, that is, the error between the true value and the predicted value.

Classification Model Based on Ensemble Learning

Ensemble Learning (Ensemble Learning) is a powerful technique that can improve the accuracy of various machine learning tasks by combining multiple basic classifiers to complete the learning task. For a single model, it is easy to overfit or underfit, and each model has its own advantages and disadvantages when it is designed. Therefore, we can achieve the effect of learning from each other through model fusion based on the method of ensemble learning. Commonly used fusion schemes include Voting, Bagging, Stacking, Blending, and Boosting.

Voting

Voting refers to the voting method, adopting the principle of minority obeying the majority, and voting on the prediction results of multiple classifiers, which can be divided into ordinary voting method and weighted voting method. The weight of the weighted voting method can be set manually or subjectively or according to the model evaluation score. The voting method usually requires 3 or more models, and in order to avoid the deviation of the voting results, it is necessary to ensure the diversity of the models.

Bagging

In the Voting method, each base classifier is trained with the same all samples, while the Bagging method uses a random sampling of all samples, each classifier is trained with different samples, and the other places are exactly the same. . This avoids the homogeneity of model training results, improves the accuracy of unstable models, and reduces the degree of overfitting.

Stacking

Stacking is a hierarchical model integration framework. The prediction results obtained by several base classifiers are used as a new training set to train a learner. Taking a two-layer Stacking integration framework as an example, the first layer is composed of multiple base learners, the input is the original training set, and the model of the second layer learns other bases based on the first layer.

Blending

Boosting

Guess you like

Origin blog.csdn.net/weixin_43603658/article/details/132279312