Hyperparameter Optimization - Stochastic Grid Search

Table of contents

1. Theoretical limits of hyperparameter optimization and enumeration networks

1.1 Hyperparameter optimization HPO (HypeParameter Optimization)

1.2 Theoretical limits and shortcomings of grid search

1.3 Enumeration Grid Search

2. Randomized grid search RandomizedSearchCV

2.1 Basic principles of random grid search

2.2 Implementation of random grid search

2.3 Theoretical Limits of Random Grid Search

2.4 Implementation of continuous random grid search


1. Theoretical limits of hyperparameter optimization and enumeration networks

1.1 Hyperparameter optimization HPO (HypeParameter Optimization)

        Every machine learning algorithm will have hyperparameters, and the settings of the hyperparameters greatly affect the actual use of the algorithm. Therefore, parameter adjustment is the most basic and important task for machine learning algorithm engineers. Modern machine learning and deep learning algorithms have a large number of hyperparameters. Not only are the implementation methods extremely flexible, but algorithm performance is also affected by the compound effects of more parameters. Therefore, when the wave of artificial intelligence comes, hyperparameters can be automatically selected in the field of hyperparameter optimization HPO . It also ushered in a new round of outbreak.

        In the world of algorithms, we long for all processes to eventually become perfectly automated. The discipline that specializes in machine learning automation is called AutoML, and automatic hyperparameter optimization is the most mature, in-depth, and best-known direction in AutoML. Theoretically, when computing power and data are sufficient, HPO's performance must exceed that of humans . HPO can reduce human workload, and the results obtained by HPO are more likely to be reproduced than human search, so HPO can greatly improve the reproducibility and fairness of scientific research. Contemporary hyperparameter optimization algorithms can be mainly divided into: various types of grid-based search (Grid), various types of optimization algorithms based on Bayesian optimization (Baysian), various types of gradient-based optimization (Gradient-based), population-based Various types of optimization (evolutionary algorithms, genetic algorithms, etc.). Among them, various grid search methods and Bayesian-based optimization methods are the most popular. Bayesian optimization methods can even be called the SOTA model in contemporary hyperparameter optimization. These models are of great significance for the adjustment of complex integration algorithms.

1.2 Theoretical limits and shortcomings of grid search

        Among all hyperparameter optimization algorithms, enumeration grid search is the most basic and classic method. Before starting the search, we need to manually list the alternative values ​​​​of each hyperparameter one by one. The different values ​​​​of multiple different hyperparameters are arranged and combined to form a parameter space (parameter space). The enumeration grid search algorithm will substitute all parameter combinations in this parameter space into the model for training, and finally select the combination with the strongest generalization ability as the final hyperparameters of the model.

        For grid search, if a certain point in the parameter space points to the true minimum value of the loss function, then the minimum value and the corresponding parameters must be captured when enumerating the grid search (relatively, if in the parameter space If there is no point pointing to the true minimum value of the loss function, then the grid search will definitely not be able to find the parameter combination corresponding to the minimum value).

        The larger and denser the parameter space, the greater the possibility that the combination in the parameter space will just cover the minimum point of the loss function. This means that in extreme cases, when the parameter space exhausts all possible values, grid search will definitely be able to find the optimal parameter combination corresponding to the minimum value of the loss function, and the generalization ability of this parameter combination must be strong. For manual parameter adjustment. However, the larger the parameter space, the greater the computing power and time required for grid search. When the parameter dimension increases, the amount of calculation required for grid search increases exponentially. Take random forest as an example:

\bullet There is only 1 parameter n_estimators, the alternative range is [50,100,150,200,250,300], and it needs to be modeled 6 times.
\bullet Increase the parameter max_depth, and the alternative range is [2,3,4,5,6], which requires 30 modeling times.
\bullet Add the parameter min_sample_split, and the alternative range is [2,3,4,5], which requires 120 modeling times.

At the same time, the goal of parameter optimization is to find the combination that gives the model the strongest generalization ability. Therefore, cross-validation is needed to reflect the generalization ability of the model. Assuming that the number of cross-validation is 5, the three parameters need to be modeled 600 times. When faced with artificial neural networks, fusion models, and integrated models with numerous hyperparameters and potentially unlimited hyperparameter values, as the complexity of data and models increases, the time required for grid search will increase dramatically. A grid search can take days and nights. Therefore, we urgently need to find a more efficient hyperparameter search method. This article will introduce an improved hyperparameter optimization method based on grids - stochastic grid search, and compare its results with grid search in terms of time/space/effect.

1.3 Enumeration Grid Search

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor as RFR
from sklearn.model_selection import cross_validate,KFold

data=pd.read_csv('F:\\Jupyter Files\\机器学习进阶\\集成学习\\datasets\\House Price\\train_encode.csv',encoding='utf-8')
data.drop('Unnamed: 0', axis=1, inplace=True)
X=data.iloc[:,:-1]
y=data.iloc[:,-1]

# 参数空间
param_grid_simple = {"criterion": ["squared_error","poisson"]
                     , 'n_estimators': [*range(20,100,5)]
                     , 'max_depth': [*range(10,25,2)]
                     , "max_features": ["log2","sqrt",16,32,64,"auto"]
                     , "min_impurity_decrease": [*np.arange(0,5,10)]
                    }
#直接使用循环计算
no_option = 1
for i in param_grid_simple:
    no_option *= len(param_grid_simple[i])
no_option
1536
#模型,交叉验证,网格搜索
reg = RFR(random_state=1412,verbose=True)
cv = KFold(n_splits=5,shuffle=True,random_state=1412)
search = GridSearchCV(estimator=reg
                     ,param_grid=param_grid_simple
                     ,scoring = "neg_mean_squared_error"
                     ,verbose = True
                     ,cv = cv)

# 【TIME WARNING: 50.06min】
start = time.time()
search.fit(X,y)
print(time.time() - start)
search.best_estimator_

abs(search.best_score_)**0.5 #交叉验证下在验证集的最好结果
29251.284326350575
ad_reg = RFR(n_estimators=85, max_depth=23, max_features=16, random_state=1412)
cv = KFold(n_splits=5,shuffle=True,random_state=1412)
result_post_adjusted = cross_validate(ad_reg,X,y,cv=cv,scoring="neg_mean_squared_error"
                          ,return_train_score=True
                          ,verbose=True)

#评估指标RMSE
def RMSE(cvresult,key):
    return (abs(cvresult[key])**0.5).mean()
RMSE(result_post_adjusted,"train_score")
11000.81099038192
RMSE(result_post_adjusted,"test_score")
28572.070208366855
HPO method enum grid search
Search space/global space 1536/1536
Run time (minutes) 50.06 (single thread)
Search for optimal (RMSE) 29251.284
Reconstruction Optimal (RMSE) 28572.070

2. Randomized grid search RandomizedSearchCV

2.1 Basic principles of random grid search

        When enumerating grid search, we mentioned that as the complexity of data and models increases, the time required for grid search increases dramatically. Taking the random forest algorithm as an example, if more than 10,000 data are used, the search time will immediately increase for several hours. Therefore, we urgently need to find a more efficient hyperparameter search method. First of all, when the algorithm used is determined, there are two factors that determine the speed of enumeration grid search operations: ① The size of the parameter space (the larger the parameter space, the more modeling is required); ② The size of the data ( The larger the amount of data, the more computing power and time required for each modeling).

        Therefore, the grid search optimization method in sklearn mainly includes two categories, one is to adjust the search space , and the other is to adjust the data for each training . Among them, the specific method of adjusting the parameter space is to abandon the global hyperparameter space that must be used in the original search, instead select some parameter combinations, construct a hyperparameter subspace, and search only in the subspace.

        Take the two-dimensional space in the figure below as an example. In this parameter space composed of n_estimators and max_depth, the value of n_estimators is assumed to be [50,100,150,200,250,300], and the value of max_depth is assumed to be [2,3,4,5,6], then The enumeration grid search must search all 30 parameter combinations. When we adjust the search space, we can sample only the orange parameter combinations as a "subspace" and search only for the orange parameter combinations. As a result, the amount of calculation required for the overall search has been greatly reduced. It originally required 30 modeling times, but now only requires 8 modeling times.

fig, [ax1, ax2] = plt.subplots(1,2,dpi=300)
n_e_list = [*range(50,350,50)]
m_d_list = [*range(2,7)]
comb = pd.DataFrame([(n_estimators, max_depth) for n_estimators in n_e_list for max_depth in m_d_list])

ax1.scatter(comb.iloc[:,0],comb.iloc[:,1],cmap="Blues")
ax1.set_xticks([*range(50,350,50)])
ax1.set_yticks([*range(2,7)])
ax1.set_xlabel("n_estimators")
ax1.set_ylabel("max_depth")
ax1.set_title("GridSearch")

ax2.scatter(comb.iloc[:,0],comb.iloc[:,1],cmap="Blues")
ax2.scatter([50,250,200,200,300,100,150,150],[4,2,6,3,2,3,2,5],cmap="red",s=20,linewidths=5)
ax2.set_xticks([*range(50,350,50)])
ax2.set_yticks([*range(2,7)])
ax2.set_xlabel("n_estimators")
ax2.set_ylabel("max_depth")
ax2.set_title("RandomSearch");

In sklearn, the method of randomly extracting parameter subspaces and searching in the subspace is called random grid search RandomizedSearchCV. As the search space is reduced, the number of parameter groups that need to be enumerated and compared is also reduced, and the overall search time is also reduced. Therefore:

① When the same global space is set, the operation speed of random search is much faster than that of enumeration grid search .

② When setting the same number of training times, random search can cover a much larger space than enumeration grid search .

③ At the same time, the minimum loss obtained by random grid search is very close to the minimum loss obtained by enumeration grid search .

It can be said that it improves the calculation speed without harming the accuracy of the search too much. However, when random grid search is actually running, it does not sample the subspace first and then search the subspace. Instead, it is like a "loop iteration". In this iteration, a set of parameters is randomly selected for modeling. The following In one iteration, one set of parameters is randomly selected for modeling. Since this random sampling is without replacement, there will be no problem of selecting the same set of parameters twice. We can control the number of iterations of random grid search to control the size of the overall extracted parameter subspace. This approach is often called "giving random grid search a fixed amount of calculation. After all the calculation amount is consumed, Random grid search just stops".

2.2 Implementation of random grid search

from sklearn.model_selection import RandomizedSearchCV

class sklearn.model_selection.RandomizedSearchCV(estimator, param_distributions, *, n_iter=10, scoring=None, n_jobs=None, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', random_state=None, error_score=nan, return_train_score=False)

All parameters are interpreted as follows, among which the bolded parameters are unique to random grid search:

Name Description
estimator Parameter adjustment object, an evaluator
param_distributions Global parameter space, which can be a dictionary or a list of dictionaries
n_iter The number of iterations. The more iterations, the larger the sub-parameter space extracted.
scoring Evaluation indicators, supporting the output of multiple parameters at the same time
n_jobs Set the number of threads participating in calculations during work
refit Pick evaluation metrics and optimal parameters and train on the full dataset
cv Cross-validation folds
verbose Output work log format
pre_dispatch The number of task divisions during multi-tasking parallelism
random_state random number seed
error_score When the grid search reports an error, the result is returned. When 'raise' is selected, the error will be reported directly and the training process will be interrupted. In other cases, a warning message will be displayed and the training will continue to be completed.
return_train_score Whether to display parameter scores in the training set in cross-validation
#打包成函数方便后续使用
#评估指标RMSE
def RMSE(cvresult,key):
    return (abs(cvresult[key])**0.5).mean()

#计算参数空间大小
def count_space(param):
    no_option = 1
    for i in param_grid_simple:
        no_option *= len(param_grid_simple[i])
    print(no_option)
    
#在最优参数上进行重新建模验证结果
def rebuild_on_best_param(ad_reg):
    cv = KFold(n_splits=5,shuffle=True,random_state=1412)
    result_post_adjusted = cross_validate(ad_reg,X,y,cv=cv,scoring="neg_mean_squared_error"
                                          ,return_train_score=True
                                          ,verbose=True)
    print("训练RMSE:{:.3f}".format(RMSE(result_post_adjusted,"train_score")))
    print("测试RMSE:{:.3f}".format(RMSE(result_post_adjusted,"test_score")))
# 相同的全域参数空间
param_grid_simple = {"criterion": ["squared_error","poisson"]
                     , 'n_estimators': [*range(20,100,5)]
                     , 'max_depth': [*range(10,25,2)]
                     , "max_features": ["log2","sqrt",16,32,64,"auto"]
                     , "min_impurity_decrease": [*np.arange(0,5,10)]
                    }
#计算全域参数空间大小,这是我们能够抽样的最大值
count_space(param_grid_simple)
1536
#建立回归器、交叉验证
reg = RFR(random_state=1412,verbose=True)
cv = KFold(n_splits=5,shuffle=True,random_state=1412)
#定义随机搜索
search = RandomizedSearchCV(estimator=reg
                            ,param_distributions=param_grid_simple
                            ,n_iter = 800 #子空间的大小是全域空间的一半左右
                            ,scoring = "neg_mean_squared_error"
                            ,verbose = True
                            ,cv = cv
                            ,random_state=1412
                           )
#训练随机搜索评估器
#=====【TIME WARNING: 1297.368000984192s】=====#
import time
start = time.time()
search.fit(X,y)
print(time.time() - start)
#查看模型结果
search.best_estimator_

abs(search.best_score_)**0.5
29251.284326350575
#根据最优参数重建模型
ad_reg = RFR(max_depth=24, max_features=16, min_impurity_decrease=0,
                      n_estimators=85,  random_state=1412,
                      verbose=True)
rebuild_on_best_param(ad_reg)
Training RMSE: 11031.299 
Testing RMSE: 28639.969

HPO method enum grid search random grid search
Search space/global space 1536/1536 800/1536
Run time (minutes) 50.06 (single thread) 21.62(single thread) (↓)
Search for optimal (RMSE) 29179.698 29251.284
Reconstruction Optimal (RMSE) 28572.070 28639.969(↑)

Obviously, in the case of the same parameter space and the same model, the operation speed of random grid search is half that of ordinary grid search. Of course, this is closely related to the fact that the subspace is half of the global space. Since random search only reduces the number of searches and does not affect the search process itself, its running time is basically equal to n_iter/number of global space combinations * the running time of grid search.

2.3 Theoretical Limits of Random Grid Search

        Although the search speed can be improved by shrinking the subspace, the accuracy of random grid search does not seem to be reduced too much. Can random grid search get as good results as grid search? Is it also like grid search, can we get the optimal combination of parameters? Why are the results of random grid search still consistent with grid search after reducing the parameter space?

        Among machine learning algorithms, there are many methods of improving computing speed through randomization (such as Kmeans, which randomly selects samples to build cluster centers, small-batch stochastic gradient descent, which uses randomness to reduce the samples required for each iteration), or improves the model through randomness. Effect operations (such as random forests, such as extremely random trees). The principles behind the two types of randomness are completely different, and random grid search belongs to the former. This type of machine learning method is always accompanied by the operation of "sampling from a certain full data set/full domain", and this operation can effectively The fundamental reason is:

① The sampled subspace can feedback the distribution of the global space to a certain extent, and the larger the subspace is (the more parameter combinations it contains), the closer the distribution of the subspace is to the distribution of the global space.

② When the global space itself is dense enough, a small subspace can also obtain a distribution similar to the global space.

③ If the global space includes the theoretical minimum value of the loss function, then a subspace that is highly similar to the global space distribution is likely to also include the minimum value of the loss function, or include a series of sub-minimum values ​​that are very close to the minimum value.

Here we use a set of default data visualization abstract concepts in the matplotlib tool library mplot3d:

from mpl_toolkits.mplot3d import axes3d
p1, p2, MSE = axes3d.get_test_data(0.05)
len(p1) #120 参数1的取值有120个
len(p2) #120 参数2的取值有120个
MSE.shape #(120, 120) 损失函数值,总共14400个点
#绘制P1与P2的参数空间 - 这是一个呈现出14400个点的密集空间
plt.figure(dpi=300)
plt.scatter(p1,p2,s=0.2)
plt.xticks(fontsize=9)
plt.yticks(fontsize=9);

 ps: The function get_test_data that automatically obtains data can automatically generate data that conforms to a certain distribution. Now assume that there are two parameters in this set of data, p1 and p2, and the parameter combination composed of the two parameters corresponds to the loss function value MSE. The parameter 0.05 refers to the distance between points in the parameter space. The smaller the number, the more samples are taken out.

#参数与损失共同构建的函数
p1, p2, MSE = axes3d.get_test_data(0.05)
plt.figure(dpi=300)
ax = plt.axes(projection="3d")
ax.plot_wireframe(p1,p2,MSE,rstride=2,cstride=2,linewidth=0.5)
ax.view_init(2, -15)
ax.zaxis.set_tick_params(labelsize=7)
ax.xaxis.set_tick_params(labelsize=7)
ax.yaxis.set_tick_params(labelsize=7);

np.min(MSE) #整个参数空间中,可获得的MSE最小值
-73.39620971601681

Now extract the subspace from this space:

ps: Now extract n combinations from the space. The larger n is, the larger the subspace is. There are a total of 14,400 combinations. For points that are drawn, the value of the loss function is MSE. For points that are not drawn, the value of the loss function is a null value. Therefore, you only need to find the point that was not selected and make its loss function value MSE empty.

import numpy as np
n = 100
#从0~14400中生成(14400-n)个随机数,形成没有被抽到子空间中的点的索引
unsampled = np.random.randint(0,14400,14400-n)
p1, p2, MSE = axes3d.get_test_data(0.05)

#拉平MSE,并将所有没抽中的点的损失函数变为空值
MSE = MSE.ravel()
MSE[unsampled] = np.nan
MSE = MSE.reshape((120,120))
#设置完毕空值后,记得把MSE恢复成原来的结构,否则绘图报错

#参数与损失共同构建的函数
plt.figure(dpi=300)
ax = plt.axes(projection="3d")
ax.view_init(2, -15)
ax.plot_wireframe(p1,p2,MSE,rstride=2,cstride=2,linewidth=0.5)
ax.zaxis.set_tick_params(labelsize=7)
ax.xaxis.set_tick_params(labelsize=7)
ax.yaxis.set_tick_params(labelsize=7);

#求出当前损失函数上的最小值
#注意此时因为MSE中含有了空值,因此要先排除空值影响,否则min函数会返回空值
MSE = MSE.ravel().tolist()
MSE = [x for x in MSE if str(x) != 'nan']
print(np.min(MSE))
-73.24243733589367

 The following facts can be verified through images:

① The sampled subspace can feedback the distribution of the global space to a certain extent, and the larger the subspace is (the more parameter combinations it contains), the closer the distribution of the subspace is to the distribution of the global space.

② When the global space itself is dense enough, a small subspace can also obtain a distribution similar to the global space.

③ If the global space includes the theoretical minimum value of the loss function, then a subspace that is highly similar to the global space distribution is likely to also include the minimum value of the loss function, or include a series of sub-minimum values ​​that are very close to the minimum value.

        However, since random grid search is faster to calculate, we can use a larger/dense global space for random grid search with the same computing resources, so random search may get better results than grid search. :

#创造参数空间 - 让整体参数空间变得更密
param_grid_simple = {'n_estimators': [*range(80,100,1)]
                     , 'max_depth': [*range(10,25,1)]
                     , "max_features": [*range(10,20,1)]
                     , "min_impurity_decrease": [*np.arange(0,5,10)]
                    }
#计算全域参数空间大小,这是我们能够抽样的最大值
count_space(param_grid_simple)  #3000
#建立回归器、交叉验证
reg = RFR(random_state=1412,verbose=True)
cv = KFold(n_splits=5,shuffle=True,random_state=1412)

#定义随机搜索
search = RandomizedSearchCV(estimator=reg
                            ,param_distributions=param_grid_simple
                            ,n_iter = 1536 #使用与枚举网格搜索类似的拟合次数
                            ,scoring = "neg_mean_squared_error"
                            ,verbose = True
                            ,cv = cv
                            ,random_state=1412)
#训练随机搜索评估器
#=====【TIME WARNING】=====#
start = time.time()
search.fit(X,y)
end = time.time() - start
print(end/60) #33.02493980725606min
#查看最佳评估器
search.best_estimator_

abs(search.best_score_)**0.5 #交叉验证下在验证集的最好结果
29012.90569846546
rebuild_on_best_param(search.best_estimator_)
Training RMSE: 11208.818 
Testing RMSE: 28346.673
HPO method enum grid search random grid search Random grid search (large space)
Search space/global space 1536/1536 800/1536 1536/3000
Run time (minutes) 50.06 (single thread) 21.62(single thread) (↓) 33.02(Single thread) (↓)
Search for optimal (RMSE) 29179.698 29251.284 29012.906(↓)
Reconstruction Optimal (RMSE) 28572.070 28639.969(↑) 28346.673(↓)

It can be found that when the global parameter space increases, random grid search can explore denser/larger spaces in a time similar to grid search on small spaces, thereby obtaining better results. In addition to tolerating a larger parameter space, stochastic grid search can accept continuous variables as input to the parameter space.

2.4 Implementation of continuous random grid search

Continuous parameter space:

For grid search, the points in the parameter space are evenly distributed and uniformly spaced, because grid search cannot extract data from a certain "distribution" and can only use combined parameters to combine points, while random search can Accepts "distribution" as input. As shown in the figure above, for grid search, if the lowest point of the loss function is unfortunately located between the two sets of parameters, in this case, enumeration grid search is 100% impossible to find the minimum value. But for random grid search, since parameter points are randomly selected on a distribution, it is more likely to get better values ​​in the same parameter space. 

import scipy #使用scipy来帮助我们建立分布
scipy.stats.uniform(loc=1,scale=100)

ps: uniform is a uniform distribution. By default, a number between [0,1] is generated. You can use loc to adjust the starting point and scale to adjust the end point. We can also choose other distributions, such as exponential distribution, gamma distribution, or randint. Note that scipy does not generate a discrete number like np.linspace(), but directly generates a distribution object. Moreover, we did not specify the size in the distribution object. In other words, how many random parameter alternative values ​​​​we should take on this distribution is determined by the random search itself. Theoretically, the larger the n_iter we give, the more points that may be taken from the distribution of any parameter. Therefore, when the parameter space contains a certain distribution, we cannot estimate the size of the global parameter space.

        When adjusting the random forest before, all the parameters we gave can only accept positive integers. Here you can use scipy.stats.randint, but randint is not a continuous distribution in the strict sense. Strictly speaking, continuous search is more suitable for parameters such as learning rate, C, and alpha (no upper limit, mainly floating point numbers). Among the parameters of the random forest, the closest to this definition is min_impurity_decrease, which indicates that the decision tree is in the branch. The minimum amount of impurity drop that can be tolerated. Now with this parameter, a random grid search is performed using a uniform distribution.

param_grid_simple = {'n_estimators': [*range(80,100,1)]
                     , 'max_depth': [*range(10,25,1)]
                     , "max_features": [*range(10,20,1)]
                     , "min_impurity_decrease": scipy.stats.uniform(0,50)
                    }
#建立回归器、交叉验证
reg = RFR(random_state=1412,verbose=True)
cv = KFold(n_splits=5,shuffle=True,random_state=1412)

#定义随机搜索
search = RandomizedSearchCV(estimator=reg
                            ,param_distributions=param_grid_simple
                            ,n_iter = 1536 #还是使用1536这个搜索次数
                            ,scoring = "neg_mean_squared_error"
                            ,verbose = True
                            ,cv = cv
                            ,random_state=1412)
#训练随机搜索评估器
#=====【TIME WARNING】=====#
start = time.time()
search.fit(X,y)
end = time.time() - start
print(end/60) #32.72446912527084min
#查看最佳评估器
search.best_estimator_

abs(search.best_score_)**0.5
29113.405359664695
rebuild_on_best_param(search.best_estimator_)
Training RMSE: 11296.682 
Testing RMSE: 28455.434
HPO method enum grid search random grid search

random grid search

(big space)

random grid search

(continuous type)

Search space/global space 1536/1536 800/1536 1536/3000 1536/unlimited
Run time (minutes) 50.06 (single thread) 21.62(single thread) (↓) 33.02(Single thread) (↓) 32.72 (single thread)
Search for optimal (RMSE) 29179.698 29251.284 29012.906(↓) 29113.405
Reconstruction Optimal (RMSE) 28572.070 28639.969(↑) 28346.673(↓) 28455.434

In this search, since the best possible min_impurity_decrease value has been previously known to be 0, forcibly expanding the search space to a larger number may cause the model effect to decrease (the point of min_impurity_decrease=0 is not extracted). However, in random forest, min_impurity_decrease is the only parameter that can be searched using distribution, so here we tolerate this part of the performance increase.

        Theoretically, when the global parameter space used by enumerated grid search is large enough/dense enough, the optimal solution of enumerated grid search is the upper limit of random grid search, so theoretically random grid search will not obtain Better results than enumeration grid search. But the problem in reality is that because the speed of enumeration grid search is too slow, the global parameter space of enumeration grid search often cannot be set very large, nor can it be set very densely, so the results of grid search are difficult Close to the theoretical optimal value. When random grid search sets the space to be larger and denser, it can capture the distribution of a wider space, and it is naturally possible to capture the theoretical optimal value.

Guess you like

Origin blog.csdn.net/weixin_60200880/article/details/131859162