第32步 机器学习分类实战:SHAP

继续填坑,这回到SHAP,这个是选修,有兴趣可以看看。

我们建立了十个ML模型,如果选出了Xgboost、LightGBM、Catboost这种树模型(大概率也是这些最厉害了),那就可以用SHAP进行模型可视化。

(1)首先,使用pip install shap进行安装,记得是在Anconda Prompt敲入:

(2)然后,我们以Xgboost为例子,开整:

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

dataset = pd.read_csv('X disease code fs.csv')

X = dataset.iloc[:, 1:14].values

Y = dataset.iloc[:, 0].values

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.30, random_state = 666)

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

X_train = sc.fit_transform(X_train)

X_test = sc.transform(X_test)



import xgboost as xgb

param_grid=[{

            'n_estimators':[35],

            'eta':[0.1],

            'max_depth':[1],

            'gamma':[0],

            'min_child_weight':[5],

            'max_delta_step':[1],

            'subsample':[0.8],

            'colsample_bytree':[0.8],

            'colsample_bylevel':[0.8],

            'reg_lambda':[9],

            'reg_alpha':[5],

            },

           ]

boost = xgb.XGBClassifier()

classifier = xgb.XGBClassifier()

from sklearn.model_selection import GridSearchCV

grid_search = GridSearchCV(boost, param_grid, n_jobs = -1, verbose = 2, cv=10)      

grid_search.fit(X_train, y_train)    

classifier = grid_search.best_estimator_  

classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_test)

y_testprba = classifier.predict_proba(X_test)[:,1]

y_trainpred = classifier.predict(X_train)

y_trainprba = classifier.predict_proba(X_train)[:,1]

from sklearn.metrics import confusion_matrix

cm_test = confusion_matrix(y_test, y_pred)

cm_train = confusion_matrix(y_train, y_trainpred)

print(cm_train)

print(cm_test)

#绘画SHAP相关图:使用前先安装SHAP:pip install shap

import shap

explainer = shap.TreeExplainer(classifier)

shap.initjs()

shap_values = explainer.shap_values(X_train)

shap.summary_plot(shap_values, X_train)

输出如下:

这里的Feature 0 就是当初导入的第一个特征B,从左到右的顺序:

可以和Xgboost自带的重要指数相比较,大同小异:

具体理论和解释见以下网址,就不细说了:

SHAP知识点全汇总 - 知乎

https://www.kaggle.com/code/dansbecker/shap-values/tutorial

猜你喜欢

转载自blog.csdn.net/qq_30452897/article/details/130665917
今日推荐