xgboost 特征重要性计算

在XGBoost中提供了三种特征重要性的计算方法:

‘weight’ - the number of times a feature is used to split the data across all trees. 
‘gain’ - the average gain of the feature when it is used in trees 
‘cover’ - the average coverage of the feature when it is used in trees

简单来说 
weight就是在所有树中特征用来分割的节点个数总和; 
gain就是特征用于分割的平均增益 
cover 的解释有点晦涩,在[R-package/man/xgb.plot.tree.Rd]有比较详尽的解释:(https://github.com/dmlc/xgboost/blob/f5659e17d5200bd7471a2e735177a81cb8d3012b/R-package/man/xgb.plot.tree.Rd):the sum of second order gradient of training data classified to the leaf, if it is square loss, this simply corresponds to the number of instances in that branch. Deeper in the tree a node is, lower this metric will be。实际上coverage可以理解为被分到该节点的样本的二阶导数之和,而特征度量的标准就是平均的coverage值。

还是举李航书上那个例子,我们用不同颜色来表示不同的特征,绘制下图 
这里写图片描述

猜你喜欢

转载自www.cnblogs.com/cupleo/p/9951436.html
今日推荐