Common interview questions-machine learning (continuous update)

The difference and contact between SVM and LRhttps: //blog.csdn.net/Matrix_cc/article/details/105240748
SVM derivation, and the reason for using duality, SVM kernel function selection
Is svm sensitive to missing data, and why, decision trees.
How the decision tree handles missing data. https://www.zhihu.com/question/34867991
How svm handles multi-category
Why does svm use the maximum interval. Answer: Robust, better generalization to unknown data
svm sample selection problem, how to increase sample points.
When to choose the svm algorithm and when to choose the decision tree algorithm. Answer: svm is more suitable for processing samples with many features. However, the decision tree is prone to overfitting when processing samples with many features.
Is Bayesian a linear classifier?
Can LR be used for non-linear classification? How to understand the linearity in a linear model: LR itself is a linear model. Although the sigmoid function is added, its classification plane is linear. Non-linear classification can be done, and a non-linear mapping is required. Can logistic regression solve the problem of nonlinear classification? -Xin Junbo's answer-Knowing
Logistic regression is a classification model, why is it called regression:
Can LR use MSE as a loss function? Why not, reason
How does LR solve overfitting? 1. Reduce the number of features: 1) Manually select the data that needs to be retained. 2) Use the algorithm of model selection. 2. Normalization: Keep all features, but reduce the size of the parameter θ.
Compare LR and GBDT, under what circumstances is GBDT inferior to LR
1. Compare LR and GBDT:

(1) LR is a linear model, and GBDT is a non-linear tree model, so usually in order to enhance the nonlinear expression ability of the model, there will be very heavy feature engineering tasks before using the LR model;

(2) LR is a single mode, while GBDT is an integrated model. Generally speaking, in the case of low data noise, the effect of GBDT will be better than LR;

(3) LR uses the gradient descent method for training, which requires normalization of features, while GBDT selects features based on the gini coefficient during the training process, and calculates the optimal feature value cut-off point, which does not require feature normalization .

2. Where GBDT is not as good as LR:

On the one hand, when the model needs to be explained, GBDT is obviously more "black box" than LR, because it is impossible for us to explain every tree. In contrast. The feature weight of LR can intuitively reflect the contribution of features to different types of samples, and because of such a good understanding, in many cases we can make more convincing marketing and operation strategies based on the analysis conclusions obtained by the LR model; On the one hand, the large-scale parallel training of the LR model is very mature, the model iteration speed is very fast, and the business personnel can quickly get feedback from the model and make targeted corrections to the model. The serial integration method such as GBDT makes it very difficult to parallelize, and the training speed is very slow under the large data scale.
bagging and boosting the difference between the answers
What does the small deviation and large variance show? It shows that it is over-fitting and the complexity of the model needs to be reduced. The other way around? Under-fitting requires an increase in model complexity.
What algorithms need to be normalized? https://blog.csdn.net/sinat_29508201/article/details/53056843
The difference between decision tree, GBDT, random forest
Introduce the differences, advantages and disadvantages of xgboost, xgboost and GBDT
How does XGBoost choose the best split point? What about decision trees and GBDT? 1) XGBoost uses a greedy algorithm to split, two for loops, the first for traverses all features, and the second for finds the best eigenvalue as the basis for selecting the split point. score is the reduction of the loss function before and after splitting. According to the gain generated after each split, the characteristic value of the feature with the largest gain is finally selected as the best split point. 2) The decision tree is divided using Gini coefficients. 3) GBDT uses regression trees. The method of dividing regression trees is to divide the data sets D1 and D2 into the data sets D1 and D2 on both sides of the corresponding arbitrary division point s for any division feature A, and find the minimum square loss of each set of D1 and D2. The feature and feature value division point corresponding to the smallest sum of square loss of D1 and D2.
How does XGBoost handle missing values?

1) When searching for split nodes on a certain column of features, the missing samples will not be traversed, only the feature values on non-missing samples will be traversed, which reduces the time overhead of finding split nodes for sparse discrete features.
2) In addition, in order to ensure completeness, for samples with missing values, they will be assigned to the left leaf node and right leaf node respectively, and then the direction with the largest gain after splitting will be selected as the default for the missing eigenvalue samples during prediction Branch direction.
3) If there are no missing values in the training set but there are in the test set, then the missing values are divided into the direction of the right leaf node by default.

Why does GBDT use regression trees and why not use classification trees? The reason is that GBDT fits the gradient value (continuous value) each time, so the regression tree is used
How to judge the quality of the classifier (evaluation index of the classifier)
Introduce the Kmeans algorithm. Does Kmeans converge? Why can it converge? The goal of K-Means optimization is the sum of the squares of the distance of each sample from the center point of its class, which is divided into two steps in each iteration process: update the center point and update the class of the sample. Both of these steps will reduce the objective function. So it will definitely converge. K-Means can also be regarded as a special case of the EM algorithm. The EM algorithm can guarantee convergence.
Sample imbalance processing methods: weight adjustment, sampling (over-sampling and under-sampling), indicators that are not sensitive to sample imbalance (F1), the two properties of focal_loss: focal loss are regarded as the core, in fact, it is to use a suitable function to go Measure the contribution of hard-to-classify and easy-to-classify samples to the total loss. Add an adjustment factor to reduce the weight of easy-to-classify samples, focusing on training for difficult samples
EM algorithm
Precision rate, accuracy rate, recall rate and ROC, AUC https://blog.csdn.net/u013063099/article/details/80964865
PR curve https://www.guoyaohua.com/classification-metrics.html#pr%E6%9B%B2%E7%BA%BF
AUC principle and calculation method https://blog.csdn.net/qq_22238533/article/details/78666436
The difference between Bayesian and Naive Bayes
The difference between classification tree and regression tree
Principles of Decision Trees
What is a production model and a discriminant model
What is the difference between objective function, loss function, and cost function?
The objective function is the function that ultimately needs to be optimized, including experience loss and structure loss.

obj=loss+Ω

Experience loss (loss) is the legendary loss function or cost function. The structure loss (Ω) is a function such as a regular term that controls the complexity of the model.
What algorithm needs to be normalized, and why not? What is the role of normalization? https://blog.csdn.net/u014535528/article/details/82977653
The difference between multi-class and multi-label, how to choose the loss function of multi-class and multi-label?
The difference between KL divergence and cross entropy https://blog.csdn.net/Dby_freedom/article/details/83374650

Common interview questions-machine learning (continuous update)

Guess you like