I am a rookie who is getting started, I hope to record what I have learned like taking notes, and I also hope to help people who are also getting started.
Table of contents
1. Common methods and their core
1. Linear discriminant analysis
2. The advantages, disadvantages and application of these common methods
1. Linear discriminant analysis
6. Applicable methods in different situations
(1) It is recommended to use when the data is linearly inseparable
(2) It is recommended to use when there is prior probability distribution information
(3) It is recommended to use when the data distribution is unknown
(4) It is not recommended to use when there are many feature attributes
3. Intercommunication between Naive Bayesian Classifier and Logistic Regression
4. Two classifications to multiple classifications
1. Common methods and their core
1. Linear discriminant analysis
Map all samples to one-dimensional coordinate axes in a dimensionality reduction-based manner, and then set a threshold to distinguish samples. The mapping basis is: the distance between classes is large, and the distance within classes is small.
Take two categories as an example:
Goal: Maximize——
Final Results:
See Linear Discriminant Analysis (LDA) for details
2. Logistic regression
Using linear regression for classification tasks, the Sigmoid function limits the linearly changed value between 0 and 1 as the probability of determining it as a class 1, and determines the category it belongs to by comparing the probability value. (Multi-classification replaces the Sigmoid function with softmax)
The most suitable W and b are obtained by the maximum likelihood estimation method.
Goal formula (two categories):
For details, see Logistic Regression (Logistic Regression)
3. Bayesian classifier
Make hypothetical distribution assumptions on the sample data, use Bayesian decision theory, maximum likelihood estimation and Laplace smoothing to find the most suitable distribution parameters, and obtain the final classifier (the commonly used hypothetical distribution is Gaussian distribution)
See Bayesian classifier for details - ttya's blog - CSDN blog
4. Decision tree
Use information entropy and information gain to determine decision nodes and build a decision tree
See detailed explanation of decision tree_ttya's blog-CSDN blog_decision tree
5、SVM
Find the divided hyperplane (linear) with the largest interval, and use the kernel function as a transformation to realize the division of nonlinear data.
The solutions are all related to the inner product.
For details, see the detailed explanation of the SVM model_ttya's blog-CSDN blog_svm model
2. The advantages, disadvantages and application of these common methods
1. Linear discriminant analysis
(1) Advantages
high speed;
The prior knowledge experience of the category can be used in the dimensionality reduction process;
(2) Disadvantages
LDA is not suitable for dimensionality reduction of samples with non-Gaussian distribution;
LDA dimensionality reduction can be reduced to the dimension of the number of categories N-1. If the dimensionality of our dimensionality reduction is greater than N-1, LDA cannot be used;
LDA can overfit the data;
2. Logistic regression
(1) Advantages
Suitable for classification scenarios;
The calculation cost is not high, and it is easy to understand and implement;
There is no need to assume the data distribution in advance, which avoids the problems caused by inaccurate assumptions;
Not only the category is predicted, but also the approximate probability prediction can be obtained;
The objective function can be derived at any order;
(2) Disadvantages
It is easy to underfit and the classification accuracy is not high;
The performance effect is not good when the data features are missing or the feature space is large;
3. Bayesian classifier
(1) Advantages
Simple, high learning efficiency;
The time and space overhead in the classification process is small;
Preconditions can be used;
(2) Disadvantages
Affected by the assumption of independence between variables and the distribution of assumptions (the assumption is greatly affected, if the assumption is inaccurate, the classification result will also have a great impact)
4. Decision tree
(1) Advantages
relatively simple;
Can handle nonlinear classification problems;
When applied to complex multi-stage decision-making, the stages are obvious and the levels are clear;
(2) Disadvantages
easy to overfit;
The scope of use is limited and cannot be applied to some decisions that cannot be expressed in quantities;
The determination of the occurrence probability of various schemes is sometimes highly subjective, which may lead to wrong decision-making;
5、SVM
(1) Advantages
The kernel function can be used to map to a high-dimensional space to solve nonlinear classification;
The classification idea is very simple, which is to maximize the interval between the sample and the decision surface;
The classification effect is better;
(2) Disadvantages
It is difficult to train on large-scale data;
It is difficult to directly perform multi-classification, but you can use indirect methods (one-to-one, one-to-many);
6. Applicable methods in different situations
(1) It is recommended to use when the data is linearly inseparable
Decision tree, SVM (kernel function), Bayesian classifier
(2) It is recommended to use when there is prior probability distribution information
Bayesian Classifiers, Linear Discriminant Analysis
(3) It is recommended to use when the data distribution is unknown
Logistic Regression, Decision Trees, SVM
(4) It is not recommended to use when there are many feature attributes
(It is recommended to delete some characteristic attributes with little change and little influence (not very relevant))
SVM (computationally heavy)
3. Intercommunication between Naive Bayesian Classifier and Logistic Regression
Take two categories as an example:
! ! ! The attributes are not related to each other! ! !
In logistic regression, we know that
And Bayes' theorem tells us:
Then the two are divided by:
That is:
So back to the Bayesian basic formula:
Isn't the last one our logistic regression?
4. Two classifications to multiple classifications
Two-category learning is extended to multi-category learning - Programmer Sought
5. Category imbalance problem
The problem of category imbalance in classification tasks - Programmer Sought
Everyone is welcome to criticize and correct in the comment area, thank you~