特征重要性--feature_importance - 代码天地

特征重要性--feature_importance

其他 2020-04-29 17:52:47 阅读次数: 0

以random forest为例，feature importance特性有助于模型的可解释性。简单考虑下，就算在解释性很强的决策树模型中，如果树过于庞大，人类也很难解释它做出的结果。

随机森林通常会有上百棵树组成，更加难以解释。好在我们可以找到那些特征是更加重要的，从而辅助我们解释模型。更加重要的是可以剔除一些不重要的特征，降低杂讯。比起pca降维后的结果，更具有人类的可理解性。
feature importance有两种常用实现思路：
　　（1） mean decrease in node impurity:

feature importance is calculated by looking at the splits of each tree.
The importance of the splitting variable is proportional to the improvement to the gini index given by that split and it is accumulated (for each variable) over all the trees in the forest.

就是计算每棵树的每个划分特征在划分准则（gini或者entropy）上的提升，然后对聚合所有树得到特征权重

　　（2） mean decrease in accuracy:

 This method, proposed in the original paper, passes the OOB samples down the tree and records prediction accuracy. 
A variable is then selected and its values in the OOB samples are randomly permuted. OOB samples are passed down the tree and accuracy is computed again. 
A decrease in accuracy obtained by this permutation is averaged over all trees for each variable and it provides the importance of that variable (the higher the decreas the higher the importance).

　　　　简单来说，如果该特征非常的重要，那么稍微改变一点它的值，就会对模型造成很大的影响。

自己造数据太麻烦，可以直接在OOB数据集对该维度的特征数据进行打乱，重新训练测试，打乱前的准确率减去打乱后的准确率就是该特征的重要度。该方法又叫permute。

参考博客：

1.特征选择之tree的feature_importance的缺陷和处理方法和另一篇

猜你喜欢

转载自www.cnblogs.com/wqbin/p/12803594.html

特征重要性--feature_importance

RandomForest中的feature_importance

筛选重要特征的方法feature_importance_

feature_importances_提取特征重要性的应用

重要性采样（Importance Sampling）

集成学习,xgboost.plot_importance 特征重要性（示例）

特征重要性分析

【英文演讲】（运动的重要性）Importance of Sports

重要性采样(Importance Sampling)详细学习笔记

使用XGboost模块XGBClassifier、plot_importance来做特征重要性排序——修改f1,f2等字段

随机森林的特征重要性原理

xgboost 特征重要性计算

sklearn:特征与树木森林的重要性

模型融合---特征重要性评估

机器学习特征重要性分析

特征选择 | 变量重要性衡量

Leetcode之深度优先搜索（DFS）专题-690. 员工的重要性（Employee Importance）

LeetCode 690. Employee Importance 员工的重要性(C++/Java)

JVET提案学习：块重要性映射Block importance mapping

【Math】重要性采样 Importance sample推导【附带Python实现】

随机深林-特征重要性计算方式

ML12:特征重要性选取

用xgboost模型对特征重要性进行排序

xgboost输出特征重要性排名和权重值

pyspark 随机森林特征重要性

【代码模版】特征重要性评估模版

随机森林是否需要交叉验证+特征的重要性

如何用Python计算特征重要性？

特征选择对于机器学习重要性

利用随机森林对特征重要性进行评估

今日推荐

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

报告：Django 仍然是 74% 开发者的首选

《2024 年一季度互联网投融资运行情况》研究报告

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

周排行

curl的POST请求，封装方法

8.1.1. Integer Types

Java基础 Day05(个人复习整理)

Python - Django - 中间件 process_exception

小L的试卷

【Shell编程】（函数）判断用户是否存在

python(css样式)

spring ant path 匹配原则 - 【笔记】

《JavaScript与JScript从入门到精通》(美)James.Jaworski.中译本.扫描版.pdf

Eclipse运行带参数的java程序

每日归档

更多

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)