Optimize the parameters of xgboost with Bayesian optimization method

In addition to the grid search and random search we usually do, I found that the Bayesian optimization method is quite good. Then I tried it and found that the effect is quite good. I will share my code here:

Bayesian optimization finds the value that minimizes the objective function by establishing a substitution function (probability model) based on the past evaluation results of the objective function. The Bayesian method is different from random or grid search in that it will refer to the previous evaluation results when trying the next set of hyperparameters, so it can save a lot of useless work.

The evaluation of hyperparameters is very costly, because it requires the use of the hyperparameters to be evaluated to train the model, and many deep learning models require several hours and days to complete the training and evaluate the model, so it is costly. Bayesian tuning uses a constantly updated probability model to "focus" promising hyperparameters by inferring past results.

1 Import library package

from skopt import BayesSearchCV
import xgboost as xgb
from sklearn.model_selection import train_test_split
import pandas as pd
from sklearn.model_selection import StratifiedKFold
import numpy as np
from sklearn.utils import shuffle

2 Load data


train_path='ads_train.csv'
train_data=pd.read_csv(train_path)

3 Data set feature processing

train_data = shuffle(train_data)
X=train_data[['isbuyer', 'buy_freq', 'visit_freq', 'buy_interval',
       'sv_interval', 'expected_time_buy', 'expected_time_visit', 'last_buy', 'multiple_buy', 'multiple_visit', 'uniq_urls',
       'num_checkins']]
Y=train_data[['y_buy']]
X_train,X_test,y_train,y_test=train_test_split(X,Y,test_size=0.2)

Optimized code

ITERATIONS=100
# Classifier
bayes_cv_tuner = BayesSearchCV(
    estimator = xgb.XGBClassifier(
        n_jobs = 1,
        objective = 'binary:logistic',
        eval_metric = 'auc',
        silent=1,
        tree_method='approx'
    ),
    search_spaces = {
        'learning_rate': (0.01, 1.0, 'log-uniform'),
        'min_child_weight': (0, 10),
        'max_depth': (0, 50),
        'max_delta_step': (0, 20),
        'subsample': (0.01, 1.0, 'uniform'),
        'colsample_bytree': (0.01, 1.0, 'uniform'),
        'colsample_bylevel': (0.01, 1.0, 'uniform'),
        'reg_lambda': (1e-9, 1000, 'log-uniform'),
        'reg_alpha': (1e-9, 1.0, 'log-uniform'),
        'gamma': (1e-9, 0.5, 'log-uniform'),
        'min_child_weight': (0, 5),
        'n_estimators': (50, 100),
        'scale_pos_weight': (1e-6, 500, 'log-uniform')
    },    
    scoring = 'roc_auc',
    cv = StratifiedKFold(
        n_splits=5,
        shuffle=True,
        random_state=42
    ),
    n_jobs = 6,
    n_iter = ITERATIONS,   
    verbose = 0,
    refit = True,
    random_state = 42
)

def status_print(optim_result):
    """Status callback durring bayesian hyperparameter search"""
    
    # Get all the models tested so far in DataFrame format
    all_models = pd.DataFrame(bayes_cv_tuner.cv_results_)    
    
    # Get current parameters and the best parameters    
    best_params = pd.Series(bayes_cv_tuner.best_params_)
    print('Model #{}\nBest ROC-AUC: {}\nBest params: {}\n'.format(
        len(all_models),
        np.round(bayes_cv_tuner.best_score_, 4),
        bayes_cv_tuner.best_params_
    ))
    print(dict(bayes_cv_tuner.best_params_))
    
    
    # Save all model results
    clf_name = bayes_cv_tuner.estimator.__class__.__name__
    all_models.to_csv(clf_name+"_cv_results.csv")

result = bayes_cv_tuner.fit(X.values, Y.values, callback=status_print)

references

Bayesian hyperparameter tuning of xgBoost automatic machine learning hyperparameter tuning
(Bayesian optimization)

Guess you like

Origin blog.csdn.net/w5688414/article/details/113484064