Cross validation and grid search

Cross-validation

Cross-validation: Divide the obtained training data into training and validation sets. Take the following figure as an example: divide the data into 4 parts, one of which is used as the verification set. Then after 4 times (groups) of tests, each time a different verification set is changed. That is, the results of 4 sets of models are obtained, and the average value is taken as the most network result. Also known as 4-fold cross-validation. 10-fold cross-validation is often used in actual business.

When we are doing data set division, it is time to divide the data into training set and test set, but in order to make the model results obtained from training more accurate. Do the following:

·训练舞:训练集+验证集
·测试集:测试集

Insert picture description here

Hyperparameter Search-Grid Search

通常情况下,有很多参数是需要手动指定的(如k-近邻算法中的K值),这种叫超参数。
但是手动过程繁杂,所以需要对模型预设几种超参数组合,每组超参数都采用交叉验证来进行评估。
最后选出最优参数组合建立模型。
如:如k-近邻算法中,
    选择K=3-->模型1--交叉验证(准确率)
    选择K=5-->模型2--交叉验证(准确率)
    选择K=7-->模型3--交叉验证(准确率)
    选择K=9-->模型4--交叉验证(准确率)
    选择K=11-->模型5--交叉验证(准确率)
    ......
    最终选定最优模型的过程。

Model selection and tuning API

sklearn.model_selection.GridSearchCV(estimator,param_grid=None,cv=None)#网格搜索和交叉验证
对估计器的指定参数值进行详尽搜索
    。estimator:估计器对象
    。param_grid:估计器参数(dic){
    
    "n_neighbors":[1,3,5,...]}
    。cv:指定几折交叉验证
.fit():输入训练数据
.score():准确率
结果分析:
    ·最佳参数:best_params_
    ·最佳结果:best_score_
    ·最佳估计器:best_estimator_
    ·交叉验证结果:cv_results_

Guess you like

Origin blog.csdn.net/qq_38851184/article/details/112541758