xgboost调参方法

本文我们只谈调参，不讲理论，理论我也不会，爱咋咋地。
本文源自于这篇博文xgboost理论+实践，青出于蓝而不胜于蓝。

第一步，确定n_estimators的数量

首先需要确定一个较大的学习率，learning_rate先选择为0.1。其他次要参数随便设设就行

xgb_train_data = xgb.DMatrix(train_x,train_y)
params = {'learning_rate' :0.1,
 #'n_estimators':10000,
 'max_depth':5,
 'min_child_weight':1,
 'gamma':0,
 'subsample':0.8,
 'colsample_bytree':0.8,
 'objective': 'multi:softmax',
 'nthread':4,
 'num_class':19,
 'seed':27}
 xgb_cv = xgb.cv(params,xgb_train_data,num_boost_round = 10000, nfold = 5, metrics='mlogloss',early_stopping_rounds=50)
 len(xgb_cv)

第二步，确定max_depth 和 min_child_weight

params = {
 'max_depth':range(3,12,2),
 'min_child_weight':range(1,8,2)
}
grid = GridSearchCV(XGBClassifier(learning_rate =0.1, n_estimators=201, max_depth=5,
min_child_weight=1, gamma=0, subsample=0.8,             colsample_bytree=0.8,
 objective= 'multi:softmax', nthread=4,     scale_pos_weight=1, seed=27), params, cv=5, scoring="accuracy",n_jobs = 4)
 grid.fit(train_x,train_y)

通常做法是先在一个大概的范围内确定一个粗糙的参数，然后再在这个粗糙的范围内细化。

第三步，确定gamma

在已经调整好其它参数的基础上，我们可以进行gamma参数的调优了。Gamma参数取值范围可以很大，我这里把取值范围设置为5了。你其实也可以取更精确的gamma值。

params = {
 'gamma':[i/10.0 for i in range(0,5)]
}

第四步，确定subsample 和 colsample_bytree

params = {
 'subsample':[i/10.0 for i in range(6,10)],
 'colsample_bytree':[i/10.0 for i in range(6,10)]
}

第五步，正则化参数调优

params = {
 'reg_alpha':[1e-5, 1e-2, 0.1, 1, 100]
}

总结，其实大多数情况下，参数调优所带来的的性能提升很有限。性能想大幅度提升还是主要依靠特征工程MMP。

第一步，确定n_estimators的数量

第二步，确定max_depth 和 min_child_weight

第三步，确定gamma

第四步，确定subsample 和 colsample_bytree

第五步，正则化参数调优

猜你喜欢