Hyperopt: Distributed Asynchronous Hyperparameter Optimization

1 Overview

In the process of deep learning model training, parameter optimization is a relatively cumbersome process. Grid search and manual search are generally used, so this parameter optimization sometimes looks like Taishang Laojun refining elixir. Something a bit mysterious.
So is there a tool that can automatically tune it? Well, the Hyperopt tool introduced in this section is used for this purpose.
Hyperopt is a Python library for serial and parallel optimization over complex search spaces that may include real-valued, discrete, and conditional dimensions.

Hyperopt currently implements three algorithms:
Random Search
Tree of Parzen Estimators (TPE)
Adaptive TPE

Hyperopt was designed to accommodate Bayesian optimization algorithms based on Gaussian processes and regression trees, but these algorithms are not currently implemented. All algorithms can be parallelized in the following two ways:
Apache Spark
MongoDB
is a big data processing engine and the other is a distributed database.

2. Install hyperopt

Installation (it is still recommended to add the Douban image)

pip3 install --user hyperopt -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

You can see that the following modules are installed:

Successfully built future
Installing collected packages: zipp, numpy, importlib-resources, decorator, tqdm, scipy, py4j, networkx, future, cloudpickle, hyperopt
Successfully installed cloudpickle-2.2.1 decorator-4.4.2 future-0.18.3 hyperopt-0.2.7 importlib-resources-5.4.0 networkx-2.5.1 numpy-1.19.5 py4j-0.10.9.7 scipy-1.5.4 tqdm-4.64.1 zipp-3.6.0

3. Test

3.1、hyperopt_test.py

After installing it, let's test an example:

gedit hyperopt_test.py
from hyperopt import fmin, tpe, space_eval,hp
def objective(args):
    case, val = args
    if case == 'case 1':
        return val
    else:
        return val ** 2

# define a search space
space = hp.choice('a',
    [
        ('case 1', 1 + hp.lognormal('c1', 0, 1)),
        ('case 2', hp.uniform('c2', -10, 10))
    ])

# minimize the objective over the space
best = fmin(objective, space, algo=tpe.suggest, max_evals=100)

print(best)
print(space_eval(space, best))



best2 = fmin(fn=lambda x: x ** 2,
    space=hp.uniform('x', -8, -2),
    algo=tpe.suggest,
    max_evals=200)
print(best2)

 run:

python3 hyperopt_test.py
'''
100%|██████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 269.38trial/s, best loss: 6.787702954398033e-05]
{'a': 1, 'c2': -0.008238751698162794}
('case 2', -0.008238751698162794)
100%|██████████████████████████████████████████████████████████████████████████████| 200/200 [00:00<00:00, 335.54trial/s, best loss: 4.000953693453848]
{'x': -2.000238409153731}
'''

Among them , objective is the objective function. If the objective function is minimized through the fmin function, then the optimal parameters are obtained.
The result returned is a dictionary type. The iteration process can display a progress bar ( disabled by verbose=False ). The previous best iteration is 100 times, the following best2 iteration is 200 times, as well as the minimum loss function and optimized results.

3.2. fmin function

 The key point here is the fmin function. Let’s check the help document help(fmin) for this function :

fmin(fn, space, algo=None, max_evals=None, timeout=None, loss_threshold=None, trials=None, rstate=None, allow_trials_fmin=True, pass_expr_memo_ctrl=None, catch_eval_exceptions=False, verbose=True, return_argmin=True, points_to_evaluate=None, max_queue_len=1, show_progressbar=True, early_stop_fn=None, trials_save_file='')
Minimize a function over a hyperparameter space.
#最小化超参数空间上的函数
    
    More realistically: *explore* a function over a hyperparameter space
    according to a given algorithm, allowing up to a certain number of
    function evaluations.  As points are explored, they are accumulated in
    `trials`
    
    
    Parameters
    ----------
    
    fn : callable (trial point -> loss)
        This function will be called with a value generated from `space`
        as the first and possibly only argument.  It can return either
        a scalar-valued loss, or a dictionary.  A returned dictionary must
        contain a 'status' key with a value from `STATUS_STRINGS`, must
        contain a 'loss' key if the status is `STATUS_OK`. Particular
        optimization algorithms may look for other keys as well.  An
        optional sub-dictionary associated with an 'attachments' key will
        be removed by fmin its contents will be available via
        `trials.trial_attachments`. The rest (usually all) of the returned
        dictionary will be stored and available later as some 'result'
        sub-dictionary within `trials.trials`.
    
    space : hyperopt.pyll.Apply node or "annotated"
        The set of possible arguments to `fn` is the set of objects
        that could be created with non-zero probability by drawing randomly
        from this stochastic program involving involving hp_<xxx> nodes
        (see `hyperopt.hp` and `hyperopt.pyll_utils`).
        If set to "annotated", will read space using type hint in fn. Ex:
        (`def fn(x: hp.uniform("x", -1, 1)): return x`)
    
    algo : search algorithm
        This object, such as `hyperopt.rand.suggest` and
        `hyperopt.tpe.suggest` provides logic for sequential search of the
        hyperparameter space.
    
    max_evals : int
        Allow up to this many function evaluations before returning.
    
    timeout : None or int, default None
        Limits search time by parametrized number of seconds.
        If None, then the search process has no time constraint.
    
    loss_threshold : None or double, default None
        Limits search time when minimal loss reduced to certain amount.
        If None, then the search process has no constraint on the loss,
        and will stop based on other parameters, e.g. `max_evals`, `timeout`
    
    trials : None or base.Trials (or subclass)
        Storage for completed, ongoing, and scheduled evaluation points.  If
        None, then a temporary `base.Trials` instance will be created.  If
        a trials object, then that trials object will be affected by
        side-effect of this call.
    rstate : numpy.random.Generator, default numpy.random or `$HYPEROPT_FMIN_SEED`
        Each call to `algo` requires a seed value, which should be different
        on each call. This object is used to draw these seeds via `randint`.
        The default rstate is
        `numpy.random.default_rng(int(env['HYPEROPT_FMIN_SEED']))`
        if the `HYPEROPT_FMIN_SEED` environment variable is set to a non-empty
        string, otherwise np.random is used in whatever state it is in.
    
    verbose : bool
        Print out some information to stdout during search. If False, disable
            progress bar irrespectively of show_progressbar argument
    
    allow_trials_fmin : bool, default True
        If the `trials` argument
    
    pass_expr_memo_ctrl : bool, default False
        If set to True, `fn` will be called in a different more low-level
        way: it will receive raw hyperparameters, a partially-populated
        `memo`, and a Ctrl object for communication with this Trials
        object.
    
    return_argmin : bool, default True
        If set to False, this function returns nothing, which can be useful
        for example if it is expected that `len(trials)` may be zero after
        fmin, and therefore `trials.argmin` would be undefined.
    
    points_to_evaluate : list, default None
        Only works if trials=None. If points_to_evaluate equals None then the
        trials are evaluated normally. If list of dicts is passed then
        given points are evaluated before optimisation starts, so the overall
        number of optimisation steps is len(points_to_evaluate) + max_evals.
        Elements of this list must be in a form of a dictionary with variable
        names as keys and variable values as dict values. Example
        points_to_evaluate value is [{'x': 0.0, 'y': 0.0}, {'x': 1.0, 'y': 2.0}]
    Returns
    -------
    
    argmin : dictionary
        If return_argmin is True returns `trials.argmin` which is a dictionary.  Otherwise
        this function  returns the result of `hyperopt.space_eval(space, trails.argmin)` if there
        were successfull trails. This object shares the same structure as the space passed.
        If there were no successfull trails, it returns None.

    max_queue_len : integer, default 1
        Sets the queue length generated in the dictionary or trials. Increasing this
        value helps to slightly speed up parallel simulatulations which sometimes lag
        on suggesting a new trial.
    
    show_progressbar : bool or context manager, default True (or False is verbose is False).
        Show a progressbar. See `hyperopt.progress` for customizing progress reporting.
    
    early_stop_fn: callable ((result, *args) -> (Boolean, *args)).
        Called after every run with the result of the run and the values returned by the function previously.
        Stop the search if the function return true.
        Default None.
    
    trials_save_file: str, default ""
        Optional file name to save the trials object to every iteration.
        If specified and the file already exists, will load from this file when
        trials=None instead of creating a new base.Trials object

 3.3. Visualization function

 Let’s look at another example of y=(x-3)² . First draw the graph of this function so that it looks more intuitive:

import numpy as np
import matplotlib.pylab as plt

x=np.linspace(-10,16)
y=(x-3)**2
plt.xlabel('x')
plt.ylabel('y')
plt.plot(x,y,'r--',label='(x-3)**2')
plt.title("y=(x-3)**2")
#plt.legend()
plt.show()

As shown below:

For more drawing skills, you can check out: Python drawing (histograms, multiple subgraphs, two-dimensional graphics, three-dimensional graphics, and pictures within pictures) 

From the picture, we can see that the value that minimizes the function, x is 3. Of course, you can know this without looking at the picture. Okay, now let’s test it:

best = fmin(
    fn=lambda x: (x-3)**2,
    space=hp.uniform('x', -10, 10),
    algo=tpe.suggest,
    max_evals=100)
print(best)
#{'x': 2.967563715953902}

Try adjusting the maximum number of iterations of max_evals to 1000 and see what the result is. It will be closer to 3.

3.4. HP range value

space is the spatial search range, where hp contains the following value methods:

'choice', 'lognormal', 'loguniform', 'normal', 'pchoice', 'qlognormal', 'qloguniform', 'qnormal', 'quniform', 'randint', 'uniform', 'uniformint'

It should be noted that the return value of normal distribution cannot limit the range. Let's do a comparison test:

from hyperopt import hp
import hyperopt.pyll.stochastic
 
space = {
    'x':hp.uniform('x', 0, 1),
    'y':hp.normal('y', 0, 1),
    'z':hp.randint('z',0,10),
    'c':hp.choice('City', ['GuangZhou','ShangHai', 'BeiJing']),
}
>>> print(hyperopt.pyll.stochastic.sample(space))
{'c': 'GuangZhou', 'x': 0.38603237555669656, 'y': -0.19782139601114704, 'z': array(1)}
>>> print(hyperopt.pyll.stochastic.sample(space))
{'c': 'ShangHai', 'x': 0.7838648171908386, 'y': 0.43014722187588245, 'z': array(8)}
>>> print(hyperopt.pyll.stochastic.sample(space))
{'c': 'BeiJing', 'x': 0.5137264208587933, 'y': -0.10021079359026988, 'z': array(4)}
>>> print(hyperopt.pyll.stochastic.sample(space))
{'c': 'BeiJing', 'x': 0.7201793839228087, 'y': 0.11571302115909506, 'z': array(0)}
>>> print(hyperopt.pyll.stochastic.sample(space))
{'c': 'GuangZhou', 'x': 0.21906317438496536, 'y': -1.645732195658909, 'z': array(0)}
>>> print(hyperopt.pyll.stochastic.sample(space))
{'c': 'ShangHai', 'x': 0.17319873908122796, 'y': -0.7472225692827178, 'z': array(4)}
>>> print(hyperopt.pyll.stochastic.sample(space))
{'c': 'GuangZhou', 'x': 0.4376348587045986, 'y': 0.7303201600143362, 'z': array(7)}
>>> print(hyperopt.pyll.stochastic.sample(space))
{'c': 'BeiJing', 'x': 0.43311251571433906, 'y': 1.216596288611056, 'z': array(1)}
>>> print(hyperopt.pyll.stochastic.sample(space))
{'c': 'BeiJing', 'x': 0.17755989388617366, 'y': 0.3168677593459059, 'z': array(4)}
>>> print(hyperopt.pyll.stochastic.sample(space))
{'c': 'GuangZhou', 'x': 0.6058631246917083, 'y': -0.2849664724345445, 'z': array(1)}

It can be seen that in the output sample space, the values ​​of y of the normal distribution appear negative , and the others are within a limited range.

3.5. Trials tracking

Trials are used to understand some return information during the iteration process. Let's take a look at an example:

from hyperopt import fmin, tpe, hp, STATUS_OK, Trials

fspace = {
    'x': hp.uniform('x', -5, 5)
}

def f(params):
    x = params['x']
    val = (x-3)**2
    return {'loss': val, 'status': STATUS_OK}

trials = Trials()
best = fmin(fn=f, space=fspace, algo=tpe.suggest, max_evals=50, trials=trials)

print(best)
#{'x': 2.842657137743265}

for trial in trials.trials[:5]:
    print(trial)
'''
{'state': 2, 'tid': 0, 'spec': None, 'result': {'loss': 12.850632865897229, 'status': 'ok'}, 'misc': {'tid': 0, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [0]}, 'vals': {'x': [-0.5847779381570106]}}, 'exp_key': None, 'owner': None, 'version': 0, 'book_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 615000), 'refresh_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 615000)}
{'state': 2, 'tid': 1, 'spec': None, 'result': {'loss': 23.862240347848957, 'status': 'ok'}, 'misc': {'tid': 1, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [1]}, 'vals': {'x': [-1.884899215730961]}}, 'exp_key': None, 'owner': None, 'version': 0, 'book_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 616000), 'refresh_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 616000)}
{'state': 2, 'tid': 2, 'spec': None, 'result': {'loss': 42.84157056715999, 'status': 'ok'}, 'misc': {'tid': 2, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [2]}, 'vals': {'x': [-3.545347245728067]}}, 'exp_key': None, 'owner': None, 'version': 0, 'book_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 616000), 'refresh_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 617000)}
{'state': 2, 'tid': 3, 'spec': None, 'result': {'loss': 0.8412634189024095, 'status': 'ok'}, 'misc': {'tid': 3, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [3]}, 'vals': {'x': [3.9172041315336568]}}, 'exp_key': None, 'owner': None, 'version': 0, 'book_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 617000), 'refresh_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 617000)}
{'state': 2, 'tid': 4, 'spec': None, 'result': {'loss': 30.580983627886543, 'status': 'ok'}, 'misc': {'tid': 4, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [4]}, 'vals': {'x': [-2.5300075612865616]}}, 'exp_key': None, 'owner': None, 'version': 0, 'book_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 618000), 'refresh_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 618000)}
'''

Similarly, we can draw them based on the above iterative information, so that it looks more intuitive:

import matplotlib.pylab as plt

x=[t['misc']['vals']['x'] for t in trials.trials]
y=[t['result']['loss'] for t in trials.trials]
plt.xlabel('x')
plt.ylabel('y')
plt.scatter(x,y,c='r')
plt.show()

As shown in the picture:

You can see that the minimum value of the function is obtained near 3.
For more knowledge about scatter plots, you can check out: Python Drawing Scatter Plot (plt.scatter)

Digression: I feel that the Trials here may have been misspelled by this library. The correct word should be Trails, which means traces, and Trials means hard work and attempts.

4. Practical application

With the above knowledge in mind, let's test the actual application effect. Let's first look at an example of K nearest neighbor, using the iris data set (150 samples of three categories: setosa, versicolor, virginica ):

4.1. K nearest neighbor KNN

Install the corresponding libraries first, and ignore those that are already installed.

pip3 install --user scikit-learn -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

Next up is the code:

from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
from hyperopt import hp,STATUS_OK,Trials,fmin,tpe
 
iris=load_iris()
X=iris.data
y=iris.target
 
def hyperopt_train(params):
    clf=KNeighborsClassifier(**params)
    return cross_val_score(clf,X,y).mean()

space_knn={'n_neighbors':hp.choice('n_neighbors',range(1,50))}

def f(parmas):
    acc=hyperopt_train(parmas)
    return {'loss':-acc,'status':STATUS_OK}

trials=Trials()
best=fmin(f,space_knn,algo=tpe.suggest,max_evals=100,trials=trials)
print(best)
#{'n_neighbors': 6}

Similarly, we draw a picture to intuitively feel it:

import matplotlib.pylab as plt

x=[t['misc']['vals']['n_neighbors'] for t in trials.trials]
y=[-t['result']['loss'] for t in trials.trials]
plt.xlabel('n_neighbors')
plt.ylabel('cross_val_score')
plt.scatter(x,y,c='r')
plt.show()

4.2. Support vector classification SVC

Let’s take a look at the vector classification of this iris data set in the support vector machine:

from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from hyperopt import hp,STATUS_OK,Trials,fmin,tpe
from sklearn.svm import SVC 

iris=load_iris()
X=iris.data
y=iris.target

def hyperopt_train_test(params):
    clf =SVC(**params)
    return cross_val_score(clf, X, y).mean()

space_svm = {
    'C': hp.uniform('C', 0, 20),
    'kernel': hp.choice('kernel', ['linear', 'sigmoid', 'poly', 'rbf']),
    'gamma': hp.uniform('gamma', 0, 20),
}

def f(params):
    acc = hyperopt_train_test(params)
    return {'loss': -acc, 'status': STATUS_OK}

trials = Trials()
best = fmin(f, space_svm, algo=tpe.suggest, max_evals=100, trials=trials)
print(best)
#{'C': 0.8930681939735963, 'gamma': 8.379245134714441, 'kernel': 0}

 Draw the same picture to see the effect:

from matplotlib import pyplot as plt

parameters = ['C', 'kernel', 'gamma']
cols = len(parameters)
f, axes = plt.subplots(1,cols)
for i, val in enumerate(parameters):
    xs = [t['misc']['vals'][val] for t in trials.trials]
    ys = [-t['result']['loss'] for t in trials.trials]
    axes[i].scatter(xs, ys, c="g")
    axes[i].set_title(val)
    axes[i].set_ylim([0.9, 1.0])

plt.show()

As shown in the picture:

 4.3. Decision Tree DecisionTree

Let’s take a look at the optimization of the decision tree. The codes are similar. Here, SVC is replaced by DecisionTreeClassifier . The decision tree is based on the optimization of the number of lower layers:

from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from hyperopt import hp,STATUS_OK,Trials,fmin,tpe
from sklearn.tree import DecisionTreeClassifier


iris=load_iris()
X=iris.data
y=iris.target

def hyperopt_train_test(params):
    clf =DecisionTreeClassifier(**params)
    return cross_val_score(clf, X, y).mean()

space_dt = {
    'max_depth': hp.choice('max_depth', range(1,20)),
    'max_features': hp.choice('max_features', range(1,5)),
    'criterion': hp.choice('criterion', ["gini", "entropy"]),
}

def f(params):
    acc = hyperopt_train_test(params)
    return {'loss': -acc, 'status': STATUS_OK}

trials = Trials()
best = fmin(f, space_dt, algo=tpe.suggest, max_evals=100, trials=trials)
print(best)
#{'criterion': 0, 'max_depth': 17, 'max_features': 1}

同样的画图看下效果:
from matplotlib import pyplot as plt

parameters = ['max_depth', 'max_features', 'criterion']
cols = len(parameters)
f, axes = plt.subplots(1,cols)
for i, val in enumerate(parameters):
    xs = [t['misc']['vals'][val] for t in trials.trials]
    ys = [-t['result']['loss'] for t in trials.trials]
    axes[i].scatter(xs, ys, c="g")
    axes[i].set_title(val)
    axes[i].set_ylim([0.9, 1.0])

plt.show()

As shown in the picture:

5. Summary 

Through the understanding of hyperopt , we can efficiently find the optimal parameters in later work, mainly by setting the loss function that needs to be optimized and the spatial range value to be searched for through the fmin() method, and then iterating Find the best value. We can also specify Trials() to track the iteration information and visualize it with drawings to facilitate our more intuitive observation.

Articles about some techniques for finding optimal parameters and hyperparameters:
Neural Network Techniques: How to Find the Optimal Parameters
Neural Network Techniques: How to Find the Optimal Parameters [Continued]
Neural Network Techniques: Finding the Optimal Hyperparameters
github : https://github.com/hyperopt/hyperopt

Guess you like

Origin blog.csdn.net/weixin_41896770/article/details/132868806