1 Overview
In the process of deep learning model training, parameter optimization is a relatively cumbersome process. Grid search and manual search are generally used, so this parameter optimization sometimes looks like Taishang Laojun refining elixir. Something a bit mysterious.
So is there a tool that can automatically tune it? Well, the Hyperopt tool introduced in this section is used for this purpose.
Hyperopt is a Python library for serial and parallel optimization over complex search spaces that may include real-valued, discrete, and conditional dimensions.
Hyperopt currently implements three algorithms:
Random Search
Tree of Parzen Estimators (TPE)
Adaptive TPE
Hyperopt was designed to accommodate Bayesian optimization algorithms based on Gaussian processes and regression trees, but these algorithms are not currently implemented. All algorithms can be parallelized in the following two ways:
Apache Spark
MongoDB
is a big data processing engine and the other is a distributed database.
2. Install hyperopt
Installation (it is still recommended to add the Douban image)
pip3 install --user hyperopt -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com
You can see that the following modules are installed:
Successfully built future
Installing collected packages: zipp, numpy, importlib-resources, decorator, tqdm, scipy, py4j, networkx, future, cloudpickle, hyperopt
Successfully installed cloudpickle-2.2.1 decorator-4.4.2 future-0.18.3 hyperopt-0.2.7 importlib-resources-5.4.0 networkx-2.5.1 numpy-1.19.5 py4j-0.10.9.7 scipy-1.5.4 tqdm-4.64.1 zipp-3.6.0
3. Test
3.1、hyperopt_test.py
After installing it, let's test an example:
gedit hyperopt_test.py
from hyperopt import fmin, tpe, space_eval,hp
def objective(args):
case, val = args
if case == 'case 1':
return val
else:
return val ** 2
# define a search space
space = hp.choice('a',
[
('case 1', 1 + hp.lognormal('c1', 0, 1)),
('case 2', hp.uniform('c2', -10, 10))
])
# minimize the objective over the space
best = fmin(objective, space, algo=tpe.suggest, max_evals=100)
print(best)
print(space_eval(space, best))
best2 = fmin(fn=lambda x: x ** 2,
space=hp.uniform('x', -8, -2),
algo=tpe.suggest,
max_evals=200)
print(best2)
run:
python3 hyperopt_test.py
'''
100%|██████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 269.38trial/s, best loss: 6.787702954398033e-05]
{'a': 1, 'c2': -0.008238751698162794}
('case 2', -0.008238751698162794)
100%|██████████████████████████████████████████████████████████████████████████████| 200/200 [00:00<00:00, 335.54trial/s, best loss: 4.000953693453848]
{'x': -2.000238409153731}
'''
Among them , objective is the objective function. If the objective function is minimized through the fmin function, then the optimal parameters are obtained.
The result returned is a dictionary type. The iteration process can display a progress bar ( disabled by verbose=False ). The previous best iteration is 100 times, the following best2 iteration is 200 times, as well as the minimum loss function and optimized results.
3.2. fmin function
The key point here is the fmin function. Let’s check the help document help(fmin) for this function :
fmin(fn, space, algo=None, max_evals=None, timeout=None, loss_threshold=None, trials=None, rstate=None, allow_trials_fmin=True, pass_expr_memo_ctrl=None, catch_eval_exceptions=False, verbose=True, return_argmin=True, points_to_evaluate=None, max_queue_len=1, show_progressbar=True, early_stop_fn=None, trials_save_file='')
Minimize a function over a hyperparameter space.
#最小化超参数空间上的函数
More realistically: *explore* a function over a hyperparameter space
according to a given algorithm, allowing up to a certain number of
function evaluations. As points are explored, they are accumulated in
`trials`
Parameters
----------
fn : callable (trial point -> loss)
This function will be called with a value generated from `space`
as the first and possibly only argument. It can return either
a scalar-valued loss, or a dictionary. A returned dictionary must
contain a 'status' key with a value from `STATUS_STRINGS`, must
contain a 'loss' key if the status is `STATUS_OK`. Particular
optimization algorithms may look for other keys as well. An
optional sub-dictionary associated with an 'attachments' key will
be removed by fmin its contents will be available via
`trials.trial_attachments`. The rest (usually all) of the returned
dictionary will be stored and available later as some 'result'
sub-dictionary within `trials.trials`.
space : hyperopt.pyll.Apply node or "annotated"
The set of possible arguments to `fn` is the set of objects
that could be created with non-zero probability by drawing randomly
from this stochastic program involving involving hp_<xxx> nodes
(see `hyperopt.hp` and `hyperopt.pyll_utils`).
If set to "annotated", will read space using type hint in fn. Ex:
(`def fn(x: hp.uniform("x", -1, 1)): return x`)
algo : search algorithm
This object, such as `hyperopt.rand.suggest` and
`hyperopt.tpe.suggest` provides logic for sequential search of the
hyperparameter space.
max_evals : int
Allow up to this many function evaluations before returning.
timeout : None or int, default None
Limits search time by parametrized number of seconds.
If None, then the search process has no time constraint.
loss_threshold : None or double, default None
Limits search time when minimal loss reduced to certain amount.
If None, then the search process has no constraint on the loss,
and will stop based on other parameters, e.g. `max_evals`, `timeout`
trials : None or base.Trials (or subclass)
Storage for completed, ongoing, and scheduled evaluation points. If
None, then a temporary `base.Trials` instance will be created. If
a trials object, then that trials object will be affected by
side-effect of this call.
rstate : numpy.random.Generator, default numpy.random or `$HYPEROPT_FMIN_SEED`
Each call to `algo` requires a seed value, which should be different
on each call. This object is used to draw these seeds via `randint`.
The default rstate is
`numpy.random.default_rng(int(env['HYPEROPT_FMIN_SEED']))`
if the `HYPEROPT_FMIN_SEED` environment variable is set to a non-empty
string, otherwise np.random is used in whatever state it is in.
verbose : bool
Print out some information to stdout during search. If False, disable
progress bar irrespectively of show_progressbar argument
allow_trials_fmin : bool, default True
If the `trials` argument
pass_expr_memo_ctrl : bool, default False
If set to True, `fn` will be called in a different more low-level
way: it will receive raw hyperparameters, a partially-populated
`memo`, and a Ctrl object for communication with this Trials
object.
return_argmin : bool, default True
If set to False, this function returns nothing, which can be useful
for example if it is expected that `len(trials)` may be zero after
fmin, and therefore `trials.argmin` would be undefined.
points_to_evaluate : list, default None
Only works if trials=None. If points_to_evaluate equals None then the
trials are evaluated normally. If list of dicts is passed then
given points are evaluated before optimisation starts, so the overall
number of optimisation steps is len(points_to_evaluate) + max_evals.
Elements of this list must be in a form of a dictionary with variable
names as keys and variable values as dict values. Example
points_to_evaluate value is [{'x': 0.0, 'y': 0.0}, {'x': 1.0, 'y': 2.0}]
Returns
-------
argmin : dictionary
If return_argmin is True returns `trials.argmin` which is a dictionary. Otherwise
this function returns the result of `hyperopt.space_eval(space, trails.argmin)` if there
were successfull trails. This object shares the same structure as the space passed.
If there were no successfull trails, it returns None.
max_queue_len : integer, default 1
Sets the queue length generated in the dictionary or trials. Increasing this
value helps to slightly speed up parallel simulatulations which sometimes lag
on suggesting a new trial.
show_progressbar : bool or context manager, default True (or False is verbose is False).
Show a progressbar. See `hyperopt.progress` for customizing progress reporting.
early_stop_fn: callable ((result, *args) -> (Boolean, *args)).
Called after every run with the result of the run and the values returned by the function previously.
Stop the search if the function return true.
Default None.
trials_save_file: str, default ""
Optional file name to save the trials object to every iteration.
If specified and the file already exists, will load from this file when
trials=None instead of creating a new base.Trials object
3.3. Visualization function
Let’s look at another example of y=(x-3)² . First draw the graph of this function so that it looks more intuitive:
import numpy as np
import matplotlib.pylab as plt
x=np.linspace(-10,16)
y=(x-3)**2
plt.xlabel('x')
plt.ylabel('y')
plt.plot(x,y,'r--',label='(x-3)**2')
plt.title("y=(x-3)**2")
#plt.legend()
plt.show()
As shown below:
For more drawing skills, you can check out: Python drawing (histograms, multiple subgraphs, two-dimensional graphics, three-dimensional graphics, and pictures within pictures)
From the picture, we can see that the value that minimizes the function, x is 3. Of course, you can know this without looking at the picture. Okay, now let’s test it:
best = fmin(
fn=lambda x: (x-3)**2,
space=hp.uniform('x', -10, 10),
algo=tpe.suggest,
max_evals=100)
print(best)
#{'x': 2.967563715953902}
Try adjusting the maximum number of iterations of max_evals to 1000 and see what the result is. It will be closer to 3.
3.4. HP range value
space is the spatial search range, where hp contains the following value methods:
'choice', 'lognormal', 'loguniform', 'normal', 'pchoice', 'qlognormal', 'qloguniform', 'qnormal', 'quniform', 'randint', 'uniform', 'uniformint'
It should be noted that the return value of normal distribution cannot limit the range. Let's do a comparison test:
from hyperopt import hp
import hyperopt.pyll.stochastic
space = {
'x':hp.uniform('x', 0, 1),
'y':hp.normal('y', 0, 1),
'z':hp.randint('z',0,10),
'c':hp.choice('City', ['GuangZhou','ShangHai', 'BeiJing']),
}
>>> print(hyperopt.pyll.stochastic.sample(space))
{'c': 'GuangZhou', 'x': 0.38603237555669656, 'y': -0.19782139601114704, 'z': array(1)}
>>> print(hyperopt.pyll.stochastic.sample(space))
{'c': 'ShangHai', 'x': 0.7838648171908386, 'y': 0.43014722187588245, 'z': array(8)}
>>> print(hyperopt.pyll.stochastic.sample(space))
{'c': 'BeiJing', 'x': 0.5137264208587933, 'y': -0.10021079359026988, 'z': array(4)}
>>> print(hyperopt.pyll.stochastic.sample(space))
{'c': 'BeiJing', 'x': 0.7201793839228087, 'y': 0.11571302115909506, 'z': array(0)}
>>> print(hyperopt.pyll.stochastic.sample(space))
{'c': 'GuangZhou', 'x': 0.21906317438496536, 'y': -1.645732195658909, 'z': array(0)}
>>> print(hyperopt.pyll.stochastic.sample(space))
{'c': 'ShangHai', 'x': 0.17319873908122796, 'y': -0.7472225692827178, 'z': array(4)}
>>> print(hyperopt.pyll.stochastic.sample(space))
{'c': 'GuangZhou', 'x': 0.4376348587045986, 'y': 0.7303201600143362, 'z': array(7)}
>>> print(hyperopt.pyll.stochastic.sample(space))
{'c': 'BeiJing', 'x': 0.43311251571433906, 'y': 1.216596288611056, 'z': array(1)}
>>> print(hyperopt.pyll.stochastic.sample(space))
{'c': 'BeiJing', 'x': 0.17755989388617366, 'y': 0.3168677593459059, 'z': array(4)}
>>> print(hyperopt.pyll.stochastic.sample(space))
{'c': 'GuangZhou', 'x': 0.6058631246917083, 'y': -0.2849664724345445, 'z': array(1)}
It can be seen that in the output sample space, the values of y of the normal distribution appear negative , and the others are within a limited range.
3.5. Trials tracking
Trials are used to understand some return information during the iteration process. Let's take a look at an example:
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
fspace = {
'x': hp.uniform('x', -5, 5)
}
def f(params):
x = params['x']
val = (x-3)**2
return {'loss': val, 'status': STATUS_OK}
trials = Trials()
best = fmin(fn=f, space=fspace, algo=tpe.suggest, max_evals=50, trials=trials)
print(best)
#{'x': 2.842657137743265}
for trial in trials.trials[:5]:
print(trial)
'''
{'state': 2, 'tid': 0, 'spec': None, 'result': {'loss': 12.850632865897229, 'status': 'ok'}, 'misc': {'tid': 0, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [0]}, 'vals': {'x': [-0.5847779381570106]}}, 'exp_key': None, 'owner': None, 'version': 0, 'book_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 615000), 'refresh_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 615000)}
{'state': 2, 'tid': 1, 'spec': None, 'result': {'loss': 23.862240347848957, 'status': 'ok'}, 'misc': {'tid': 1, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [1]}, 'vals': {'x': [-1.884899215730961]}}, 'exp_key': None, 'owner': None, 'version': 0, 'book_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 616000), 'refresh_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 616000)}
{'state': 2, 'tid': 2, 'spec': None, 'result': {'loss': 42.84157056715999, 'status': 'ok'}, 'misc': {'tid': 2, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [2]}, 'vals': {'x': [-3.545347245728067]}}, 'exp_key': None, 'owner': None, 'version': 0, 'book_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 616000), 'refresh_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 617000)}
{'state': 2, 'tid': 3, 'spec': None, 'result': {'loss': 0.8412634189024095, 'status': 'ok'}, 'misc': {'tid': 3, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [3]}, 'vals': {'x': [3.9172041315336568]}}, 'exp_key': None, 'owner': None, 'version': 0, 'book_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 617000), 'refresh_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 617000)}
{'state': 2, 'tid': 4, 'spec': None, 'result': {'loss': 30.580983627886543, 'status': 'ok'}, 'misc': {'tid': 4, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [4]}, 'vals': {'x': [-2.5300075612865616]}}, 'exp_key': None, 'owner': None, 'version': 0, 'book_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 618000), 'refresh_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 618000)}
'''
Similarly, we can draw them based on the above iterative information, so that it looks more intuitive:
import matplotlib.pylab as plt
x=[t['misc']['vals']['x'] for t in trials.trials]
y=[t['result']['loss'] for t in trials.trials]
plt.xlabel('x')
plt.ylabel('y')
plt.scatter(x,y,c='r')
plt.show()
As shown in the picture:
You can see that the minimum value of the function is obtained near 3.
For more knowledge about scatter plots, you can check out: Python Drawing Scatter Plot (plt.scatter)
Digression: I feel that the Trials here may have been misspelled by this library. The correct word should be Trails, which means traces, and Trials means hard work and attempts.
4. Practical application
With the above knowledge in mind, let's test the actual application effect. Let's first look at an example of K nearest neighbor, using the iris data set (150 samples of three categories: setosa, versicolor, virginica ):
4.1. K nearest neighbor KNN
Install the corresponding libraries first, and ignore those that are already installed.
pip3 install --user scikit-learn -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com
Next up is the code:
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
from hyperopt import hp,STATUS_OK,Trials,fmin,tpe
iris=load_iris()
X=iris.data
y=iris.target
def hyperopt_train(params):
clf=KNeighborsClassifier(**params)
return cross_val_score(clf,X,y).mean()
space_knn={'n_neighbors':hp.choice('n_neighbors',range(1,50))}
def f(parmas):
acc=hyperopt_train(parmas)
return {'loss':-acc,'status':STATUS_OK}
trials=Trials()
best=fmin(f,space_knn,algo=tpe.suggest,max_evals=100,trials=trials)
print(best)
#{'n_neighbors': 6}
Similarly, we draw a picture to intuitively feel it:
import matplotlib.pylab as plt
x=[t['misc']['vals']['n_neighbors'] for t in trials.trials]
y=[-t['result']['loss'] for t in trials.trials]
plt.xlabel('n_neighbors')
plt.ylabel('cross_val_score')
plt.scatter(x,y,c='r')
plt.show()
4.2. Support vector classification SVC
Let’s take a look at the vector classification of this iris data set in the support vector machine:
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from hyperopt import hp,STATUS_OK,Trials,fmin,tpe
from sklearn.svm import SVC
iris=load_iris()
X=iris.data
y=iris.target
def hyperopt_train_test(params):
clf =SVC(**params)
return cross_val_score(clf, X, y).mean()
space_svm = {
'C': hp.uniform('C', 0, 20),
'kernel': hp.choice('kernel', ['linear', 'sigmoid', 'poly', 'rbf']),
'gamma': hp.uniform('gamma', 0, 20),
}
def f(params):
acc = hyperopt_train_test(params)
return {'loss': -acc, 'status': STATUS_OK}
trials = Trials()
best = fmin(f, space_svm, algo=tpe.suggest, max_evals=100, trials=trials)
print(best)
#{'C': 0.8930681939735963, 'gamma': 8.379245134714441, 'kernel': 0}
Draw the same picture to see the effect:
from matplotlib import pyplot as plt
parameters = ['C', 'kernel', 'gamma']
cols = len(parameters)
f, axes = plt.subplots(1,cols)
for i, val in enumerate(parameters):
xs = [t['misc']['vals'][val] for t in trials.trials]
ys = [-t['result']['loss'] for t in trials.trials]
axes[i].scatter(xs, ys, c="g")
axes[i].set_title(val)
axes[i].set_ylim([0.9, 1.0])
plt.show()
As shown in the picture:
4.3. Decision Tree DecisionTree
Let’s take a look at the optimization of the decision tree. The codes are similar. Here, SVC is replaced by DecisionTreeClassifier . The decision tree is based on the optimization of the number of lower layers:
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from hyperopt import hp,STATUS_OK,Trials,fmin,tpe
from sklearn.tree import DecisionTreeClassifier
iris=load_iris()
X=iris.data
y=iris.target
def hyperopt_train_test(params):
clf =DecisionTreeClassifier(**params)
return cross_val_score(clf, X, y).mean()
space_dt = {
'max_depth': hp.choice('max_depth', range(1,20)),
'max_features': hp.choice('max_features', range(1,5)),
'criterion': hp.choice('criterion', ["gini", "entropy"]),
}
def f(params):
acc = hyperopt_train_test(params)
return {'loss': -acc, 'status': STATUS_OK}
trials = Trials()
best = fmin(f, space_dt, algo=tpe.suggest, max_evals=100, trials=trials)
print(best)
#{'criterion': 0, 'max_depth': 17, 'max_features': 1}
同样的画图看下效果:
from matplotlib import pyplot as plt
parameters = ['max_depth', 'max_features', 'criterion']
cols = len(parameters)
f, axes = plt.subplots(1,cols)
for i, val in enumerate(parameters):
xs = [t['misc']['vals'][val] for t in trials.trials]
ys = [-t['result']['loss'] for t in trials.trials]
axes[i].scatter(xs, ys, c="g")
axes[i].set_title(val)
axes[i].set_ylim([0.9, 1.0])
plt.show()
As shown in the picture:
5. Summary
Through the understanding of hyperopt , we can efficiently find the optimal parameters in later work, mainly by setting the loss function that needs to be optimized and the spatial range value to be searched for through the fmin() method, and then iterating Find the best value. We can also specify Trials() to track the iteration information and visualize it with drawings to facilitate our more intuitive observation.
Articles about some techniques for finding optimal parameters and hyperparameters:
Neural Network Techniques: How to Find the Optimal Parameters
Neural Network Techniques: How to Find the Optimal Parameters [Continued]
Neural Network Techniques: Finding the Optimal Hyperparameters
github : https://github.com/hyperopt/hyperopt