XGBoost实用参数说明

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/qq_30024069/article/details/80402511
        XGBoost参数有很多,但实际上用上的有限。现根据训练XGBoost输出参数详细介绍参数功能。XGBoost官网参数介绍http://xgboost.readthedocs.io/en/latest/parameter.html#
                

1.General Parameters

  • booster [default=gbtree]
        which booster to use, can be gbtree, gblinear or dart. gbtree and dart use tree based model while gblinear uses linear function.
  • silent [default=0]
     0 means printing running messages, 1 means silent mode.
  • nthread [default to maximum number of threads available if not set]
        number of parallel threads used to run xgboost

2.Parameters for Tree Booster

  • eta [default=0.3, alias: learning_rate]
        step size shrinkage used in update to prevents overfitting. After each boosting step, we can directly get the weights of new features . and eta actually shrinks the feature weights to make the boosting process more conservative.
range: [0,1]
  • gamma [default=0, alias: min_split_loss]
    minimum loss reduction required to make a further partition on a leaf node of the tree. The larger, the more conservative the algorithm will be.
range: [0,∞]
  • max_depth [default=6]
       maximum depth of a tree, increase this value will make the model more complex / likely to be overfitting. 0 indicates no limit, limit is required for depth-wise grow policy.
range: [0,∞]
  • min_child_weight [default=1]
        minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further partitioning. In linear regression mode, this simply corresponds to minimum number of instances needed to be in each node. The larger, the more conservative(稳定) the algorithm will be.
range: [0,∞]
  • max_delta_step [default=0]
       Maximum delta step we allow each tree’s weight estimation to be. If the value is set to 0, it means there is no    constraint. If it is set to a positive value, it can help making the update step more conservative. Usually this    parameter is not needed, but it might help in logistic regression when class is extremely imbalanced. Set it to    value  of 1-10 might help control the update
    range: [0,∞]
  • subsample [default=1]
    subsample ratio of the training instance. Setting it to 0.5 means that XGBoost randomly collected half of the data  instances to grow trees and this will prevent overfitting.
    range: (0,1]
  • colsample_bytree [default=1]
    subsample ratio of columns when constructing each tree.
    range: (0,1]
  • colsample_bylevel [default=1]
    subsample ratio of columns for each split, in each level.
    range: (0,1]
  • lambda [default=1, alias: reg_lambda]
    L2 regularization term on weights, increase this value will make model more conservative.
  • alpha [default=0, alias: reg_alpha]
        L1 regularization term on weights, increase this value will make model more conservative.
  • scale_pos_weight, [default=1]
      Control the balance of positive and negative weights, useful for unbalanced classes. A typical value to consider:  sum(negative cases) /   sum(positive cases) See Parameters Tuning for more discussion.

3.Learning Task Parameters

    Specify the learning task and the corresponding learning objective. The objective options are below:

  • objective [default=reg:linear]
  1.  “reg:linear” –linear regression
  2.  “reg:logistic” –logistic regression
  3.   “binary:logistic” –logistic regression for binary classification, output probability
  4. “binary:logitraw” –logistic regression for binary classification, output score before logistic transformation
  5. “multi:softmax” –set XGBoost to do multiclass classification using the softmax objective, you also need to set        num_class (number of classes)
  6. “multi:softprob” –same as softmax, but output a vector of ndata * nclass, which can be further reshaped to ndata, nclass matrix.  The result contains predicted probability of each data point belonging to each class.
  7. “reg:gamma” –gamma regression with log-link. Output is a mean of gamma distribution. It might be useful, e.g., for modeling      insurance claims severity, or for any outcome that might be gamma-distributed
  • base_score [default=0.5]
  1. the initial prediction score of all instances, global bias
  2. for sufficient number of iterations, changing this value will not have too much effect.
  • seed [default=0]
        random number seed.
  • n_estimators [default=100]
        number of child models
  • n_jobs [default 1 ]
        The number of jobs to use for the computation.










                

猜你喜欢

转载自blog.csdn.net/qq_30024069/article/details/80402511
今日推荐