TASK 4 modeling parameter adjustment

TASK 4 Modeling the assistant
summary from "Datawhale zero-based entry data mining -Task4 modeling parameter adjustment" - light rain

Model building

1) Python in Sklearn library of commonly used models have been well integrated, and some integrated learning is also included. It is also relevant XGBoost model and LightGBM competition model commonly used with good results. It can be installed by a pip. Regression models commonly used loss function mainly MSE.
2) regression models: a linear regression model, Lasso Regression, Ridge regression. Available methods include least squares optimization and gradient descent.
3) Integration learning models: ensemble learning including bagging and boosting techniques, primarily by a plurality of bagging weighted model, there must be a difference between the model, while boosting technique samples by weighting the samples of the error models focus on learning . XGBoost and LightGBM such as integrated learning can improve generalization ability of the model.
4) The regression model, integrated model can also choose to evaluate the characteristics of aid as an embedded feature.

Performance Verification

1) can be obtained by cross-validation, leave-one-validation model evaluated, wherein the cross validation method training data set may be divided into N parts, each of which is selected as a validation set, the other is as a training set to train the model, this training process total of N times.
2) learning rate, and plotted curves validation, verification whether the curve can be found in the model overfitting.

Model parameter adjustment

1) The main task is to model evaluation model parameter adjustment of the target, using different optimization algorithms for parameter optimization model are operating.
2) greedy algorithm: greedy algorithm mainly refers to when to problem solving, always made in the current appears to be the best option, which is a sense of local optima, greedy strategy using the premise that local optimization strategy It can lead to global optimal solution.
3) Scheduling mesh: This method is loop through by trying every combination of parameters and returns the best combination of parameters. But this method is less efficient.
4) Scheduling Bayesian: create alternate main function (probability model), to find the value of the minimum of the objective function, the Bayesian method with different random / grid was assessed by objective function based on results of past attempts that the next set of evaluation of previous results will exceed the reference time parameter.

Personal understanding and summary

1) There are many regression models, as well as including support vector regression, neural networks, neural network model depth, different models for different data sets, and now there are some ways to convert back into a classification problem for processing, different business problems have different solutions. So, for a business problem, we need to try different performance models.
2) the means to enhance the performance of the model parameter adjustment is also very important, such as neural networks, neural network model depth, hyper-parameters very much, and no uniform application of the law. Requiring performance tuning parameters to enhance the technical performance of the model. Due to the current understanding of shallow depth knowledge of the neural network-efficient technologies much parameter adjustment, the need to multi-depth exploration.

Tianchi used car prices to predict example Practice:

1) The project on the basis of the characteristics, uses half of the cross-validation, XGBoost modeling analysis methods, while using a grid technique for optimizing optimization parameter adjustment.

Released five original articles · won praise 0 · Views 331

Guess you like

Origin blog.csdn.net/lybch1/article/details/105232214