Sklearn笔记：超参数优化

穷举法网格搜索(GridSearchCV)
随机搜索
其它

——————————————————————————————————————————————————————————
sklearn原文：超参数的优化

穷举法网格搜索(GridSearchCV)

基本语法

help(GridSearchCV):

class GridSearchCV(BaseSearchCV):

Exhaustive search over specified parameter values for an estimator.   
对估计器的指定参数值进行穷举搜索。

Important members are fit, predict.

 GridSearchCV implements a "fit" and a "score" method. It also implements "predict", "predict_proba", "decision_function",
"transform" and "inverse_transform" if they are implemented in the estimator used.

(GridSearchCV实现“Fit”和“Score”方法。

如果在所使用的估计器中实现了“预测”、“预测_Proba”、“决策函数”、“变换”和“逆变换”，则它还实现“预测”、“预测_Proba”、“决策函数”、“变换”和“逆变换”)

The parameters of the estimator used to apply these methods are optimized by cross-validated grid-search over a parameter grid.

(用于应用这些方法的估计器的参数通过参数网格上的交叉验证网格搜索来优化。)

语法：

GridSearchCV(estimator, param_grid, scoring=None, n_jobs=None, iid='deprecated', refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score=nan, return_train_score=False)[source]

refit:使用在整个数据集上找到的最佳参数重新调整估计器。

param_grid案例：

param_grid = [
  {'C': [1, 10, 100, 1000], 'kernel': ['linear']},
  {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']},
 ]

案例1：在决策树中的应用

#案例1：
from sklearn import svm,datasets
from sklearn.model_selection import GridSearchCV

iris=datasets.load_iris()
params={'kernel':("linear",'rbf'),"C":[1,10]}
svc=svm.SVC()
clf=GridSearchCV(svc,params)
clf.fit(iris.data,iris.target)

GridSearchCV(cv=None, error_score=nan,
             estimator=SVC(C=1.0, break_ties=False, cache_size=200,
                           class_weight=None, coef0=0.0,
                           decision_function_shape='ovr', degree=3,
                           gamma='scale', kernel='rbf', max_iter=-1,
                           probability=False, random_state=None, shrinking=True,
                           tol=0.001, verbose=False),
             iid='deprecated', n_jobs=None,
             param_grid={'C': [1, 10], 'kernel': ('linear', 'rbf')},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=0)

搜索器属性

clf的属性：

cv_results_ :
dict of numpy (masked) ndarrays A dict with keys as column headers and values as columns, that can be imported into a pandas DataFrame
best_estimator_ : estimator.
Estimator that was chosen by the search, i.e. estimator which gave highest score (or smallest loss if specified) on the left out data. Not available if refit=False.

(搜索选择的估计值，即对遗漏数据给出最高分数(或最小损失，如果指定)的估计值。如果``refit=False``，则不可用。)

best_score_ : float

Mean cross-validated score of the best_estimator(最佳估计的平均交叉验证分数)

best_params_ : dict
Parameter setting that gave the best results on the hold out data(在保持数据上提供最佳结果的参数设置。).

scorer_ : function or a dict
Scorer function used on the held out data to choose the best parameters for the model.

n_splits_ : int
The number of cross-validation splits (folds/iterations).

refit_time_ : float
Seconds used for refitting the best model on the whole dataset.This is present only if refit is not False.

for i in sorted(clf.cv_results_.keys()):
    print("{:15s} :".format(i),clf.cv_results_[i])

mean_fit_time   : [0.00139923 0.0017993  0.00159888 0.00140028]
mean_score_time : [0.00079985 0.0006001  0.00099983 0.00099897]
mean_test_score : [0.98       0.96666667 0.97333333 0.98      ]
param_C         : [1 1 10 10]
param_kernel    : ['linear' 'rbf' 'linear' 'rbf']
params          : [{'C': 1, 'kernel': 'linear'}, {'C': 1, 'kernel': 'rbf'}, {'C': 10, 'kernel': 'linear'}, {'C': 10, 'kernel': 'rbf'}]
rank_test_score : [1 4 3 1]
split0_test_score : [0.96666667 0.96666667 1.         0.96666667]
split1_test_score : [1.         0.96666667 1.         1.        ]
split2_test_score : [0.96666667 0.96666667 0.9        0.96666667]
split3_test_score : [0.96666667 0.93333333 0.96666667 0.96666667]
split4_test_score : [1. 1. 1. 1.]
std_fit_time    : [0.00135593 0.00074813 0.00079974 0.00080094]
std_score_time  : [3.99923509e-04 4.89979892e-04 6.32711677e-04 1.19685177e-06]
std_test_score  : [0.01632993 0.02108185 0.03887301 0.01632993]

clf.best_estimator_

SVC(C=1, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

clf.best_params_

{'C': 1, 'kernel': 'linear'}

param_grid的设置，依赖于svc=svm.SVC()

help(svm.SVC)

参数：
    C: 默认1.0. 正则化参数（必须是正的）.
    
    kernel:Specifies the kernel type to be used in the algorithm.
    
        可用选项：'linear', 'poly', 'rbf', 'sigmoid', 'precomputed' or a callable.
        
    degree:默认是3，多项式核函数的次数。只适用于'poly'

案例2：

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.svm import SVC

#加载数据集
digits=datasets.load_digits(return_X_y=True)

#X
digits[0]

array([[ 0.,  0.,  5., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ..., 10.,  0.,  0.],
       [ 0.,  0.,  0., ..., 16.,  9.,  0.],
       ...,
       [ 0.,  0.,  1., ...,  6.,  0.,  0.],
       [ 0.,  0.,  2., ..., 12.,  0.,  0.],
       [ 0.,  0., 10., ..., 12.,  1.,  0.]])

#y
digits[1]

array([0, 1, 2, ..., 8, 9, 8])

X_train,X_test,y_train,y_test=train_test_split(digits[0],digits[1],test_size=0.5,
                                              random_state=0)

#参数设置
tuned_parameters=[{'kernel':['rbf'],'gamma':[1e-3,1e-4],'C':[1,10,100,1000]},
       {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]
#计分(衡量指标)
scores=['precision','recall']

for score in scores:
    print("# tuning hyper-parameters for %s"%score)
    print()
    clf=GridSearchCV(SVC(),tuned_parameters,scoring='%s_macro'%score)
    clf.fit(X_train,y_train)
    print("Best parameters set found on development set: \n")
    print(clf.best_params_)
    print()
    print("Grid scores in development set：\n")
    means=clf.cv_results_['mean_test_score']
    stds=clf.cv_results_['std_test_score']
    for mean,std,params in zip(means,stds,clf.cv_results_['params']):
        print("%0.3f+(+/-%0.03f) for %r"%(mean,std*2,params))
        
    print("Detailed classification report:\n")
    y_true,y_pred=y_test,clf.predict(X_test)
    print(classification_report(y_true,y_pred))
    print()

# tuning hyper-parameters for precision

Best parameters set found on development set: 

{'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}

Grid scores in development set：

0.986+(+/-0.016) for {'C': 1, 'gamma': 0.001, 'kernel': 'rbf'}
0.959+(+/-0.028) for {'C': 1, 'gamma': 0.0001, 'kernel': 'rbf'}
0.988+(+/-0.017) for {'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}
0.982+(+/-0.026) for {'C': 10, 'gamma': 0.0001, 'kernel': 'rbf'}
0.988+(+/-0.017) for {'C': 100, 'gamma': 0.001, 'kernel': 'rbf'}
0.983+(+/-0.026) for {'C': 100, 'gamma': 0.0001, 'kernel': 'rbf'}
0.988+(+/-0.017) for {'C': 1000, 'gamma': 0.001, 'kernel': 'rbf'}
0.983+(+/-0.026) for {'C': 1000, 'gamma': 0.0001, 'kernel': 'rbf'}
0.974+(+/-0.012) for {'C': 1, 'kernel': 'linear'}
0.974+(+/-0.012) for {'C': 10, 'kernel': 'linear'}
0.974+(+/-0.012) for {'C': 100, 'kernel': 'linear'}
0.974+(+/-0.012) for {'C': 1000, 'kernel': 'linear'}
Detailed classification report:

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        89
           1       0.97      1.00      0.98        90
           2       0.99      0.98      0.98        92
           3       1.00      0.99      0.99        93
           4       1.00      1.00      1.00        76
           5       0.99      0.98      0.99       108
           6       0.99      1.00      0.99        89
           7       0.99      1.00      0.99        78
           8       1.00      0.98      0.99        92
           9       0.99      0.99      0.99        92

    accuracy                           0.99       899
   macro avg       0.99      0.99      0.99       899
weighted avg       0.99      0.99      0.99       899


# tuning hyper-parameters for recall

Best parameters set found on development set: 

{'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}

Grid scores in development set：

0.986+(+/-0.019) for {'C': 1, 'gamma': 0.001, 'kernel': 'rbf'}
0.957+(+/-0.028) for {'C': 1, 'gamma': 0.0001, 'kernel': 'rbf'}
0.987+(+/-0.019) for {'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}
0.981+(+/-0.028) for {'C': 10, 'gamma': 0.0001, 'kernel': 'rbf'}
0.987+(+/-0.019) for {'C': 100, 'gamma': 0.001, 'kernel': 'rbf'}
0.982+(+/-0.026) for {'C': 100, 'gamma': 0.0001, 'kernel': 'rbf'}
0.987+(+/-0.019) for {'C': 1000, 'gamma': 0.001, 'kernel': 'rbf'}
0.982+(+/-0.026) for {'C': 1000, 'gamma': 0.0001, 'kernel': 'rbf'}
0.971+(+/-0.010) for {'C': 1, 'kernel': 'linear'}
0.971+(+/-0.010) for {'C': 10, 'kernel': 'linear'}
0.971+(+/-0.010) for {'C': 100, 'kernel': 'linear'}
0.971+(+/-0.010) for {'C': 1000, 'kernel': 'linear'}
Detailed classification report:

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        89
           1       0.97      1.00      0.98        90
           2       0.99      0.98      0.98        92
           3       1.00      0.99      0.99        93
           4       1.00      1.00      1.00        76
           5       0.99      0.98      0.99       108
           6       0.99      1.00      0.99        89
           7       0.99      1.00      0.99        78
           8       1.00      0.98      0.99        92
           9       0.99      0.99      0.99        92

    accuracy                           0.99       899
   macro avg       0.99      0.99      0.99       899
weighted avg       0.99      0.99      0.99       899

案例3 嵌套和非嵌套交叉验证的区别

cawley10a.dvi :论模型选择中的过度拟合和绩效评价中的后续选择偏差

import numpy as np
from sklearn.model_selection import KFold,cross_val_score
num_trials=30
iris=datasets.load_iris()
X_iris=iris.data
y_iris=iris.target

#参数设置
p_grid={'C':[1,10,100],'gamma':[0.01,0.1]}
svm=SVC(kernel='rbf')

# Arrays to store scores
non_nested_scores = np.zeros(num_trials)
nested_scores = np.zeros(num_trials)

# Loop for each trial
for i in range(num_trials):

    # Choose cross-validation techniques for the inner and outer loops,
    # independently of the dataset.
    # E.g "GroupKFold", "LeaveOneOut", "LeaveOneGroupOut", etc.
    inner_cv = KFold(n_splits=4, shuffle=True, random_state=i)
    outer_cv = KFold(n_splits=4, shuffle=True, random_state=i)

    # Non_nested parameter search and scoring
    clf = GridSearchCV(estimator=svm, param_grid=p_grid, cv=inner_cv)
    clf.fit(X_iris, y_iris)
    non_nested_scores[i] = clf.best_score_

    # Nested CV with parameter optimization
    nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv)
    nested_scores[i] = nested_score.mean()

score_difference = non_nested_scores - nested_scores

print("Average difference of {:6f} with std. dev. of {:6f}."
      .format(score_difference.mean(), score_difference.std()))

Average difference of 0.007581 with std. dev. of 0.007833.

随机搜索

基础语法

from sklearn.model_selection import RandomizedSearchCV

help(RandomizedSearchCV):
Randomized search on hyper parameters.

RandomizedSearchCV implements a "fit" and a "score" method.

描述：其功能与网格搜索比较类似。
区别：通过交叉验证搜索参数设置来优化用于应用这些方法的估计器的参数。

与GridSearchCV不同，并非所有参数值都会被试用，而是从指定的分布中采样固定数量的参数设置。

尝试的参数设置数量由n_iter给出。如果所有参数都显示为列表，则执行不带替换的采样。

如果至少给出一个参数作为分布，则使用带替换的抽样。强烈建议对连续参数使用连续分布。

语法：

RandomizedSearchCV(estimator, param_distributions, n_iter=10, scoring=None, n_jobs=None, iid='deprecated', refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', random_state=None, error_score=nan, return_train_score=False)

参数：

param_distributions : dict or list of dicts(使用参数名称(字符串)作为键和要尝试的参数分布或列表的字典。)

    **强调参数分布**

n_iter : int, default=10(采样的参数设置数。 N_ITER权衡解决方案的运行时间和质量。)

连续性超参数(关注的是参数分布)

案例1：基础语法练习


from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform

iris=load_iris()
logistic=LogisticRegression(solver='saga',tol=1e-2,max_iter=200,
                           random_state=0)
#one obtains the uniform distribution on ``[loc, loc + scale]``.
distributions=dict(C=uniform(loc=0,scale=4),penalty=['l2','l1'])
clf=RandomizedSearchCV(logistic,distributions,random_state=0)
search=clf.fit(iris.data,iris.target)

search.best_params_

{'C': 2.195254015709299, 'penalty': 'l1'}

案例2：随机搜索与网格搜索的比较

Comparing randomized search and grid search for hyperparameter estimation — scikit-learn 0.22.2 documentation

其它

这一类搜索方法一般与具体算法相结合。