xgboost参数

https://zhuanlan.zhihu.com/p/28672955

https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/

3.1 XGBoost的参数主要分为三种(这里我就不翻译了):

  1. General Parameters: 控制总体的功能
  2. Booster Parameters: 控制单个学习器的属性
  3. Learning Task Parameters: 控制调优的步骤

(1)General Parameters:

booster [default=gbtree]

  • 选择每一次迭代中,模型的种类. 有两个选择:
  • gbtree: 基于树的模型
  • gblinear: 线性模型
  • 设为1 则不打印执行信息
  • I设为0打印信息
  • 这个是设置并发执行的信息,设置在几个核上并发
  • 如果你希望在机器的所有可以用的核上并发执行,则采用默认的参数

silent [default=0]:

nthread [default to maximum number of threads available if not set]

(2)Booster Parameters

有2种booster,线性的和树的,一般树的booster较为常用。

eta [default=0.3]

  • 类似于GBM里面的学习率
  • 通过在每一步中缩小权重来让模型更加鲁棒
  • 一般常用的数值: 0.01-0.2
  • 这个参数用来控制过拟合
  • 如果数值太大可能会导致欠拟合
  • Defines the minimum sum of weights of all observations required in a child.
  • Used to control over-fitting. Higher values prevent a model from learning relations which might be highly specific to the particular sample selected for a tree.
  • Too high values can lead to under-fitting hence, it should be tuned using CV.
  • 设置树的最大深度
  • 控制过拟合,如果树的深度太大会导致过拟合
  • 应该使用CV来调节。
  • The maximum depth of a tree, same as GBM.
  • Used to control over-fitting as higher depth will allow model to learn relations very specific to a particular sample.
  • Should be tuned using CV.
  • Typical values: 3-10
  • 叶子节点的最大值
  • 也是为了通过树的规模来控制过拟合
  • The maximum number of terminal nodes or leaves in a tree.
  • Can be defined in place of max_depth. Since binary trees are created, a depth of ‘n’ would produce a maximum of 2^n leaves.
  • If this is defined, GBM will ignore max_depth.
  • 如果叶子树确定了,对于2叉树来说高度也就定了,此时以叶子树确定的高度为准
  • 如果分裂能够使loss函数减小的值大于gamma,则这个节点才分裂。gamma设置了这个减小的最低阈值。如果gamma设置为0,表示只要使得loss函数减少,就分裂
  • 这个值会跟具体的loss函数相关,需要调节
  • A node is split only when the resulting split gives a positive reduction in the loss function. Gamma specifies the minimum loss reduction required to make a split.
  • Makes the algorithm conservative. The values can vary depending on the loss function and should be tuned.
  • In maximum delta step we allow each tree’s weight estimation to be. If the value is set to 0, it means there is no constraint. If it is set to a positive value, it can help making the update step more conservative.
  • 如果参数设置为0,表示没有限制。如果设置为一个正值,会使得更新步更加谨慎。(关于这个参数我还是没有完全理解透彻。。。)
  • Usually this parameter is not needed, but it might help in logistic regression when class is extremely imbalanced.
  • 不是很经常用,但是在逻辑回归时候,使用它可以处理类别不平衡问题。
  • 对原数据集进行随机采样来构建单个树。这个参数代表了在构建树时候 对原数据集采样的百分比。eg:如果设为0.8表示随机抽取样本中80%的个体来构建树。
  • Same as the subsample of GBM. Denotes the fraction of observations to be randomly samples for each tree.
  • Lower values make the algorithm more conservative and prevents overfitting but too small values might lead to under-fitting.
  • 相对小点的数值可以防止过拟合,但是过小的数值会导致欠拟合(因为采样过小)。
  • Typical values: 0.5-1
  • 创建树的时候,从所有的列中选取的比例。e.g:如果设为0.8表示随机抽取80%的列 用来创建树
  • Similar to max_features in GBM. Denotes the fraction of columns to be randomly samples for each tree.
  • Typical values: 0.5-1
  • Denotes the subsample ratio of columns for each split, in each level.
  • I don’t use this often because subsample and colsample_bytree will do the job for you. but you can explore further if you feel so.
  • L2 regularization term on weights (analogous to Ridge regression)
  • L2正则项,类似于Ridge Regression
  • This used to handle the regularization part of XGBoost. Though many data scientists don’t use it often, it should be explored to reduce overfitting.
  • 可以用来考虑降低过拟合,L2本身可以防止过分看重某个特定的特征。尽量考虑尽量多的特征纳入模型。
  • L1 regularization term on weight (analogous to Lasso regression)
  • L1正则。 类似于lasso
  • Can be used in case of very high dimensionality so that the algorithm runs faster when implemented
  • L1正则有助于产生稀疏的数据,这样有助于提升计算的速度
  • A value greater than 0 should be used in case of high class imbalance as it helps in faster convergence.

min_child_weight [default=1]

max_depth [default=6]

max_leaf_nodes

gamma [default=0]

max_delta_step [default=0]

subsample [default=1]

colsample_bytree [default=1]

colsample_bylevel [default=1]

lambda [default=1]

alpha [default=0]

scale_pos_weight [default=1]

(3)Learning Task Parameters

These parameters are used to define the optimization objective the metric to be calculated at each step.

objective [default=reg:linear]

  • This defines the loss function to be minimized. Mostly used values are:
  • logistic regression for binary classification, returns predicted probability (not class)
  • multiclass classification using the softmax objective, returns predicted class (not probabilities)
  • you also need to set an additional num_class (number of classes) parameter defining the number of unique classes
  • same as softmax, but returns predicted probability of each data point belonging to each class.

binary:logistic 

multi:softmax 

multi:softprob 

 

eval_metric [ default according to objective ]

  • 对于回归问题默认采用rmse,对于分类问题一般采用error
  • The metric to be used for validation data.
  • The default values are rmse for regression and error for classification.
  • Typical values are:
  • rmse – root mean square error
  • mae – mean absolute error
  • logloss – negative log-likelihood
  • error – Binary classification error rate (0.5 threshold)
  • merror – Multiclass classification error rate
  • mlogloss – Multiclass logloss
  • auc: Area under the curve

 

seed [default=0]

  • 为了产生能过重现的结果。如果不设置这个种子,每次产生的结果都会不同。
  • The random number seed.
  • Can be used for generating reproducible results and also for parameter tuning.

 

If you have been using Scikit-Learn till now, these parameter names might not look familiar. A good news is that xgboost module in python has an sklearn wrapper called XGBClassifier. It uses sklearn style naming convention. The parameters names which will change are:

短语参数命名规则。现在在xgboost的module中,有一个sklearn的封装。在这个module中命名规则和sklearn的命名规则一致。

扫描二维码关注公众号,回复: 2091562 查看本文章
  1. eta –> learning_rate
  2. lambda –> reg_lambda
  3. alpha –> reg_alpha

 

 

(4)调参

parm_grid

如果数据量不大的话可否用parm_grid罗列所有可能的参数,使用GridSearchCV来验证。

booster

设置一些初始值。
learning rate和booster不变,调节和estimators。
estimator和booster参数不变,调节learning rate
estimator和learning rate不变,调节booster。

可以从影响最大的max_depth 和min_child_weight开始。逐步调节所有可能有影响的booster参数

缩小learning rate,得到最佳的learning rate值

step6:得到一组效果还不错的参数组合

 

猜你喜欢

转载自www.cnblogs.com/hapyygril/p/9293958.html