https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/

3.1 XGBoost的参数主要分为三种（这里我就不翻译了）：

General Parameters: 控制总体的功能
Booster Parameters: 控制单个学习器的属性
Learning Task Parameters: 控制调优的步骤

（1）General Parameters:

booster [default=gbtree]

选择每一次迭代中，模型的种类. 有两个选择:
gbtree: 基于树的模型
gblinear: 线性模型
设为1 则不打印执行信息
I设为0打印信息
这个是设置并发执行的信息，设置在几个核上并发
如果你希望在机器的所有可以用的核上并发执行，则采用默认的参数

silent [default=0]:

nthread [default to maximum number of threads available if not set]

（2）Booster Parameters

有2种booster，线性的和树的，一般树的booster较为常用。

eta [default=0.3]

类似于GBM里面的学习率
通过在每一步中缩小权重来让模型更加鲁棒
一般常用的数值: 0.01-0.2
这个参数用来控制过拟合
如果数值太大可能会导致欠拟合
Defines the minimum sum of weights of all observations required in a child.
Used to control over-fitting. Higher values prevent a model from learning relations which might be highly specific to the particular sample selected for a tree.
Too high values can lead to under-fitting hence, it should be tuned using CV.
设置树的最大深度
控制过拟合，如果树的深度太大会导致过拟合
应该使用CV来调节。
The maximum depth of a tree, same as GBM.
Used to control over-fitting as higher depth will allow model to learn relations very specific to a particular sample.
Should be tuned using CV.
Typical values: 3-10
叶子节点的最大值
也是为了通过树的规模来控制过拟合
The maximum number of terminal nodes or leaves in a tree.
Can be defined in place of max_depth. Since binary trees are created, a depth of ‘n’ would produce a maximum of 2^n leaves.
If this is defined, GBM will ignore max_depth.
如果叶子树确定了，对于2叉树来说高度也就定了，此时以叶子树确定的高度为准
如果分裂能够使loss函数减小的值大于gamma，则这个节点才分裂。gamma设置了这个减小的最低阈值。如果gamma设置为0，表示只要使得loss函数减少，就分裂
这个值会跟具体的loss函数相关，需要调节
A node is split only when the resulting split gives a positive reduction in the loss function. Gamma specifies the minimum loss reduction required to make a split.
Makes the algorithm conservative. The values can vary depending on the loss function and should be tuned.
In maximum delta step we allow each tree’s weight estimation to be. If the value is set to 0, it means there is no constraint. If it is set to a positive value, it can help making the update step more conservative.
如果参数设置为0，表示没有限制。如果设置为一个正值，会使得更新步更加谨慎。（关于这个参数我还是没有完全理解透彻。。。）
Usually this parameter is not needed, but it might help in logistic regression when class is extremely imbalanced.
不是很经常用，但是在逻辑回归时候，使用它可以处理类别不平衡问题。
对原数据集进行随机采样来构建单个树。这个参数代表了在构建树时候对原数据集采样的百分比。eg：如果设为0.8表示随机抽取样本中80%的个体来构建树。
Same as the subsample of GBM. Denotes the fraction of observations to be randomly samples for each tree.
Lower values make the algorithm more conservative and prevents overfitting but too small values might lead to under-fitting.
相对小点的数值可以防止过拟合，但是过小的数值会导致欠拟合（因为采样过小）。
Typical values: 0.5-1
创建树的时候，从所有的列中选取的比例。e.g：如果设为0.8表示随机抽取80%的列用来创建树
Similar to max_features in GBM. Denotes the fraction of columns to be randomly samples for each tree.
Typical values: 0.5-1
Denotes the subsample ratio of columns for each split, in each level.
I don’t use this often because subsample and colsample_bytree will do the job for you. but you can explore further if you feel so.
L2 regularization term on weights (analogous to Ridge regression)
L2正则项，类似于Ridge Regression
This used to handle the regularization part of XGBoost. Though many data scientists don’t use it often, it should be explored to reduce overfitting.
可以用来考虑降低过拟合，L2本身可以防止过分看重某个特定的特征。尽量考虑尽量多的特征纳入模型。
L1 regularization term on weight (analogous to Lasso regression)
L1正则。类似于lasso
Can be used in case of very high dimensionality so that the algorithm runs faster when implemented
L1正则有助于产生稀疏的数据，这样有助于提升计算的速度
A value greater than 0 should be used in case of high class imbalance as it helps in faster convergence.

min_child_weight [default=1]

max_depth [default=6]

max_leaf_nodes

gamma [default=0]

max_delta_step [default=0]

subsample [default=1]

colsample_bytree [default=1]

colsample_bylevel [default=1]

lambda [default=1]

alpha [default=0]

scale_pos_weight [default=1]

（3）Learning Task Parameters

These parameters are used to define the optimization objective the metric to be calculated at each step.

objective [default=reg:linear]

This defines the loss function to be minimized. Mostly used values are:
logistic regression for binary classification, returns predicted probability (not class)
multiclass classification using the softmax objective, returns predicted class (not probabilities)
you also need to set an additional num_class (number of classes) parameter defining the number of unique classes
same as softmax, but returns predicted probability of each data point belonging to each class.

binary:logistic

multi:softmax

multi:softprob

eval_metric [ default according to objective ]

对于回归问题默认采用rmse，对于分类问题一般采用error
The metric to be used for validation data.
The default values are rmse for regression and error for classification.
Typical values are:
rmse – root mean square error
mae – mean absolute error
logloss – negative log-likelihood
error – Binary classification error rate (0.5 threshold)
merror – Multiclass classification error rate
mlogloss – Multiclass logloss
auc: Area under the curve

seed [default=0]

为了产生能过重现的结果。如果不设置这个种子，每次产生的结果都会不同。
The random number seed.
Can be used for generating reproducible results and also for parameter tuning.

If you have been using Scikit-Learn till now, these parameter names might not look familiar. A good news is that xgboost module in python has an sklearn wrapper called XGBClassifier. It uses sklearn style naming convention. The parameters names which will change are:

短语参数命名规则。现在在xgboost的module中，有一个sklearn的封装。在这个module中命名规则和sklearn的命名规则一致。

扫描二维码关注公众号，回复： 2091562 查看本文章

eta –> learning_rate
lambda –> reg_lambda
alpha –> reg_alpha

（4）调参

parm_grid

如果数据量不大的话可否用parm_grid罗列所有可能的参数，使用GridSearchCV来验证。

booster

设置一些初始值。

learning rate和booster不变，调节和estimators。

estimator和booster参数不变，调节learning rate

estimator和learning rate不变，调节booster。

可以从影响最大的max_depth 和min_child_weight开始。逐步调节所有可能有影响的booster参数

缩小learning rate，得到最佳的learning rate值

step6：得到一组效果还不错的参数组合

xgboost参数