[Machine Learning 8 Questions]

1、BoostingTree

What is a boosting tree?
A: Several decision trees form a weak classifier model, and give each wrongly classified sample greater weight.

2、GBDT

What is GBDT and how does it relate to boosted trees? How does GBDT do regression problems? How to do classification problems? How is the loss function defined in regression problems? How is the loss function defined in a classification problem? Why define the loss function like this? In the regression problem,
what loss function is used to define, and how to optimize this loss function? Is there any way to improve it? What framework is used for this optimization? What function is called? What are some important parameters to adjust?

3. Regularization

Why can L1, L2 regularization reduce overfitting? What is the difference between L1 and L2, and how are they called in sklearn and xgboost?

4. What is the essential difference between KNN and logistic regression?

Answer: KNN is linear, Logistic Regression is nonlinear

5. Random Forest

Why does random forest have the function of feature selection? What are the important parameters of random forest, what do they mean, and how to adjust the parameters?

A: The purpose of feature selection in random forests is to make the model more generalizable and robust.
The important parameters in random forest are:
max_depth, which controls the depth of the tree. Generally, the depth does not exceed int(log2(n))+1, where n is n_features (the number of features). After feature selection, the depth will be less than this value. . When the feature size is large, given max_depth, the model can be prevented from overfitting.
n_features: Perform feature selection and adjust parameters according to the size of the data. When there are a large number of irrelevant features in the dataset, these features can be regarded as noise, then an appropriate amount of features can be taken for each batch of data, which can improve the generalization of the model.

6. Overfitting

How to judge an item is overfitting? How to deal with overfitting?
Answer: When the correct rate of the fitting result of the model in the training set is close to 100%, but the correct rate of the response in different testing sets is very low, it is determined to be overfitting.

7. Optimization algorithm

Describe in detail an optimization algorithm that you are particularly familiar with. What other optimization algorithms are there, and what are the first-order optimization algorithms? What are the second-order optimization algorithms? What are the connections between them?

answer:

8. The sample is not balanced

How to solve the problem of sample imbalance, among all algorithms, which algorithm can ignore the problem of sample imbalance?

9. Call the python API to swap the dimension order of a three-dimensional or four-dimensional data.

.transform

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324844699&siteId=291194637