Summary of common classification methods in machine learning

I am a rookie who is getting started, I hope to record what I have learned like taking notes, and I also hope to help people who are also getting started.

Table of contents

1. Common methods and their core

1. Linear discriminant analysis

2. Logistic regression

3. Bayesian classifier

4. Decision tree

5、SVM

2. The advantages, disadvantages and application of these common methods

1. Linear discriminant analysis

(1) Advantages

(2) Disadvantages

2. Logistic regression

(1) Advantages

(2) Disadvantages

3. Bayesian classifier

(1) Advantages

(2) Disadvantages

4. Decision tree

(1) Advantages

(2) Disadvantages

5、SVM

(1) Advantages

(2) Disadvantages

6. Applicable methods in different situations

(1) It is recommended to use when the data is linearly inseparable

(2) It is recommended to use when there is prior probability distribution information

(3) It is recommended to use when the data distribution is unknown

(4) It is not recommended to use when there are many feature attributes

3. Intercommunication between Naive Bayesian Classifier and Logistic Regression

4. Two classifications to multiple classifications

5. Category imbalance problem


1. Common methods and their core

1. Linear discriminant analysis

Map all samples to one-dimensional coordinate axes in a dimensionality reduction-based manner, and then set a threshold to distinguish samples. The mapping basis is: the distance between classes is large, and the distance within classes is small.

Take two categories as an example:

Goal: Maximize——

 Final Results:

 See Linear Discriminant Analysis (LDA) for details

2. Logistic regression

Using linear regression for classification tasks, the Sigmoid function limits the linearly changed value between 0 and 1 as the probability of determining it as a class 1, and determines the category it belongs to by comparing the probability value. (Multi-classification replaces the Sigmoid function with softmax)

The most suitable W and b are obtained by the maximum likelihood estimation method.

Goal formula (two categories):

For details, see Logistic Regression (Logistic Regression)

3. Bayesian classifier

Make hypothetical distribution assumptions on the sample data, use Bayesian decision theory, maximum likelihood estimation and Laplace smoothing to find the most suitable distribution parameters, and obtain the final classifier (the commonly used hypothetical distribution is Gaussian distribution)

See Bayesian classifier for details - ttya's blog - CSDN blog

4. Decision tree

Use information entropy and information gain to determine decision nodes and build a decision tree

See detailed explanation of decision tree_ttya's blog-CSDN blog_decision tree

5、SVM

Find the divided hyperplane (linear) with the largest interval, and use the kernel function as a transformation to realize the division of nonlinear data.

The solutions are all related to the inner product.

For details, see the detailed explanation of the SVM model_ttya's blog-CSDN blog_svm model


2. The advantages, disadvantages and application of these common methods

1. Linear discriminant analysis

(1) Advantages

high speed;

The prior knowledge experience of the category can be used in the dimensionality reduction process;

(2) Disadvantages

LDA is not suitable for dimensionality reduction of samples with non-Gaussian distribution;

LDA dimensionality reduction can be reduced to the dimension of the number of categories N-1. If the dimensionality of our dimensionality reduction is greater than N-1, LDA cannot be used;

LDA can overfit the data;

2. Logistic regression

(1) Advantages

Suitable for classification scenarios;

The calculation cost is not high, and it is easy to understand and implement;

There is no need to assume the data distribution in advance, which avoids the problems caused by inaccurate assumptions;

Not only the category is predicted, but also the approximate probability prediction can be obtained;

The objective function can be derived at any order;

(2) Disadvantages

It is easy to underfit and the classification accuracy is not high;

The performance effect is not good when the data features are missing or the feature space is large;

3. Bayesian classifier

(1) Advantages

Simple, high learning efficiency;

The time and space overhead in the classification process is small;

Preconditions can be used;

(2) Disadvantages

Affected by the assumption of independence between variables and the distribution of assumptions (the assumption is greatly affected, if the assumption is inaccurate, the classification result will also have a great impact)

4. Decision tree

(1) Advantages

relatively simple;

Can handle nonlinear classification problems;

When applied to complex multi-stage decision-making, the stages are obvious and the levels are clear;

(2) Disadvantages

easy to overfit;

The scope of use is limited and cannot be applied to some decisions that cannot be expressed in quantities;

The determination of the occurrence probability of various schemes is sometimes highly subjective, which may lead to wrong decision-making;

5、SVM

(1) Advantages

The kernel function can be used to map to a high-dimensional space to solve nonlinear classification;

The classification idea is very simple, which is to maximize the interval between the sample and the decision surface;

The classification effect is better;

(2) Disadvantages

It is difficult to train on large-scale data;

It is difficult to directly perform multi-classification, but you can use indirect methods (one-to-one, one-to-many);

6. Applicable methods in different situations

(1) It is recommended to use when the data is linearly inseparable

Decision tree, SVM (kernel function), Bayesian classifier

(2) It is recommended to use when there is prior probability distribution information

Bayesian Classifiers, Linear Discriminant Analysis

(3) It is recommended to use when the data distribution is unknown

Logistic Regression, Decision Trees, SVM

(4) It is not recommended to use when there are many feature attributes

(It is recommended to delete some characteristic attributes with little change and little influence (not very relevant))

SVM (computationally heavy)


3. Intercommunication between Naive Bayesian Classifier and Logistic Regression

Take two categories as an example:

! ! ! The attributes are not related to each other! ! !

In logistic regression, we know that\ln \frac{p(y=1 \mid x)}{p(y=0 \mid x)}=\boldsymbol{w}^{T} X+b

And Bayes' theorem tells us:

P(Y=1|X) = \frac{P(X|Y=1)P(Y=1)}{P(X)}

P(Y=0|X) = \frac{P(X|Y=0)P(Y=0)}{P(X)}

Then the two are divided by:

- \ln \frac{p(y=1 \mid x)}{p(y=0 \mid x)}=\ln \frac{p(x|y=0)p(y=0)}{p(x|y=1)p(y=1)}

That is:

\ln \frac{p(x|y=0)p(y=0)}{p(x|y=1)p(y=1)} = -(w^{T}x+b)

So back to the Bayesian basic formula:

P(Y=1|X) = \frac{P(X|Y=1)P(Y=1)}{P(X)} = \frac{P(X|Y=1)P(Y=1)}{P(X|Y=1)P(Y=1)+P(X|Y=0)P(Y=0)} = \frac{1}{1+e^{-(w^{T}x+b)}}

Isn't the last one our logistic regression?


4. Two classifications to multiple classifications

Two-category learning is extended to multi-category learning - Programmer Sought


5. Category imbalance problem

The problem of category imbalance in classification tasks - Programmer Sought


Everyone is welcome to criticize and correct in the comment area, thank you~

Guess you like

Origin blog.csdn.net/weixin_55073640/article/details/126668382