Machine learning code combat-grid search and cross-validation (GridSearchCV)

1. Experimental Purpose

(1) Use GridSearchCV to compare the effects of different models and different parameters on the experimental results.
(2) Use the dictionary to store the model and parameters
(3) The data set used in the experiment is the handwritten digital data set that comes with sklearn

2. Import the necessary modules and read the data

from sklearn import datasets
from sklearn import svm    #支持向量机
from sklearn.ensemble import RandomForestClassifier  #随机森林
from sklearn.linear_model import LogisticRegression   #逻辑回归
from sklearn.naive_bayes import GaussianNB      #先验为高斯分布的朴素贝叶斯
from sklearn.naive_bayes import MultinomialNB   #先验为多项式分布的朴素贝叶斯
from sklearn.tree import DecisionTreeClassifier  #决策树

digits = datasets.load_digits()    #加载手写题数据集

3. Build a model-parameter dictionary

#构建模型到参数的字典
model_params = {
    'svm':{
        'model':svm.SVC(gamma='auto'),
        'params':{
            'C':[1,10,20],
            'kernel':['rbf','linear']
        }
    },
    'random_forest':{
        'model':RandomForestClassifier(),
        'params':{
            'n_estimators':[1,5,10]
        }
    },
    'logistic_regression':{
        'model':LogisticRegression(),
        'params':{
            'C':[1,5,10]
        }
    },
    'naive_bayes_gaussian':{
        'model':GaussianNB(),
        'params':{}
    },
    'naive_bayes_multinomial':{
        'model':MultinomialNB(),
        'params':{}
    },
    'decision_tree':{
        'model':DecisionTreeClassifier(),
        'params':{
            'criterion':['gini','entropy']
        }
    }
}

4. Training

from sklearn.model_selection import GridSearchCV    #导入网格搜索与交叉验证模型
import pandas as pd
scores = []

for model_name, mp in model_params.items():     
    clf = GridSearchCV(mp['model'],mp['params'],cv=5,return_train_score=False)   #实例化
    clf.fit(digits.data,digits.target)   #训练
    scores.append({
        'model':model_name,
        'best_score':clf.best_score_,
        'best_params':clf.best_params_
    })

5. Print the results of different models and different parameters

df = pd.DataFrame(scores,columns=['model','best_score','best_params'])   #把结果放入表格
df

Insert picture description here

Published 227 original articles · praised 633 · 30,000+ views

Guess you like

Origin blog.csdn.net/weixin_37763870/article/details/105465641