Hyperparameter tuning framework optuna (can be used with pytorch)



foreword

With the rapid development of deep learning today, the hyperparameter optimization of different deep learning models has always been a headache. In the case of few hyperparameters, grid search is a relatively common method, but as the amount of hyperparameters continues to increase, especially for neural networks, the parameter space composed of hyperparameters in the training process and NN itself hyperparameters It is huge, and the grid search method will consume huge resources, and the effect is very poor, so it is necessary to find a framework for "machine alchemy".

Optuna is a very commonly used hyperparameter tuning framework, which has the advantages of simple operation, strong embeddedness and dynamic adjustment of parameter space. In addition, there are other frameworks that can also optimize hyperparameters, such as automl mentioned by Mr. Li Mu.


1. The use process of optuna

First, you need to load this third-party library on the command line pip install optuna, and then import it.

There are several key terms that need to be paid attention to in optuna:
trail:: an experiment
study:: a learning process (including multiple experiments)

import optuna
def obj(trail):
	x = trail.suggest_float('x',1,5)
	return (x-3)*(x-3)
stu = optuna.creat_study(study_name = 'test', direction = 'minimize')
stu.optimize(obj, n_trials = 50)
print(study.best_params)
print(study.best_trial)
print(study.best_trial.value)

In this example code, the function obj defines a module that needs to be optimized with parameters, the hyperparameter with adjustment is 'x', and the return value is the objective value of the module. The type of hyperparameter x is float, and the adjustable space is a closed interval around [1,5]. Commonly used are suggest_int for integers, and suggest_categorical for string collections.

trail.suggest_int('name', 10, 50)  
trail.suggest_categorical('active', ['relu', 'sigmoid', 'tanh'])

The study represents a learning process, and the direction parameter is "minimize", which means that the return value of the function obj (which is also the objective value of each trial) is optimized in the direction of the minimum.

2. Results visualization

optuna.visualization contains a wealth of visualization tools. The following three are recommended:

optuna.visualization.plot_param_importances(stu).show()
optuna.visualization.plot_optimization_history(stu).show()
optuna.visualization.plot_slice(stu).show()

plot_param_importances shows the importance of individual hyperparameters on the results

insert image description here

plot_optimization_history shows the objective value and the current optimal solution for each of the n_trail trails
insert image description here

plot_slice shows the distribution of values ​​for each hyperparameter in all trails, in the form of a scatterplot
insert image description here


3. The pytorch code uses optuna

Using it in the MLP built by pytorch, you can see that the parameter tuning framework is very flexible. You can set training parameters, such as batchsize, learning rate, and also set NN parameters, such as the number of hidden layers and the type of activation function.

import torch
from torch import nn, optim
from torch.utils.data import DataLoader
from torch.autograd import Variable  # 获取变量

import optuna

def train(batch_size, learning_rate, lossfunc, opt, hidden_layer, activefunc, weightdk,momentum):  # 选出一些超参数
    trainset_num = 800
    testset_num = 50

    train_dataset = myDataset(trainset_num)
    test_dataset = myDataset(testset_num)
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=True)


    # 创建CNN模型, 并设置损失函数及优化器
    model = MLP(hidden_layer, activefunc).cuda()
    # print(model)
    if lossfunc == 'MSE':
        criterion = nn.MSELoss().cuda()
    elif lossfunc == 'MAE':
        criterion = nn.L1Loss()
  
    # optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weightdk)
    optimizer =optim.RMSprop(model.parameters(),lr=learning_rate,weight_decay=weightdk, momentum=momentum)
    # 训练过程
    for epoch in range(num_epoches):
        # 训练模式
        model.train()
        for i, data in enumerate(train_loader):
            inputs, labels, _ = data
            inputs = Variable(inputs).float().cuda()
            labels = Variable(labels).float().cuda()
            # 前向传播
            out = model(inputs)
            # 可以考虑加正则项
            train_loss = criterion(out, labels)
            optimizer.zero_grad()
            train_loss.backward()
            optimizer.step()

    model.eval()
  	
    testloss = test() #返回测试集合上的MAE
    print('Test MAE = ', resloss)
    return resloss


def objective(trail):
    batchsize = trail.suggest_int('batchsize', 1, 16)
    lr = trail.suggest_float('lr', 1e-4, 1e-2,step=0.0001)
    lossfunc = trail.suggest_categorical('loss', ['MSE', 'MAE'])
    opt = trail.suggest_categorical('opt', ['Adam', 'SGD'])
    hidden_layer = trail.suggest_int('hiddenlayer', 20, 1200)
    activefunc = trail.suggest_categorical('active', ['relu', 'sigmoid', 'tanh'])
    weightdekey = trail.suggest_float('weight_dekay', 0, 1,step=0.01)
    momentum= trail.suggest_float('momentum',0,1,step=0.01)
    loss = train(batchsize, lr, lossfunc, opt, hidden_layer, activefunc, weightdekey,momentum)
    return loss

if __name__ == '__main__':
    st=time.time()
    study = optuna.create_study(study_name='test', direction='minimize')
    study.optimize(objective, n_trials=500)
    print(study.best_params)
    print(study.best_trial)
    print(study.best_trial.value)
    print(time.time()-st)
    optuna.visualization.plot_param_importances(study).show()
    optuna.visualization.plot_optimization_history(study).show()
    optuna.visualization.plot_slice(study).show()

Guess you like

Origin blog.csdn.net/weixin_45667108/article/details/126879782