overview
The "black box" dilemma of machine learning models
The rise of machine learning models has taken us by surprise! Whether it's predicting house prices, identifying cats and dogs in pictures, or recommending your favorite music, these models have performed extremely well. But, have you ever wondered how exactly these models make these decisions?
As a Python enthusiast, we naturally hope to understand the principles behind the model. The good news is that two libraries, SHAP and LIME, can help us! They can help us reveal the internal structure of the model, allowing us to better understand and optimize the model.
One: What exactly is the SHAP value?
SHAP (SHapley Additive exPlanations) is a method for explaining machine learning models based on Shapley values from game theory . The core idea of the Shapley value is to assign a contribution value to each feature to indicate the influence of the feature on the prediction result.
1.1 Calculation method of SHAP value
First, we need to install shap
the library:
!pip install shap
Suppose we have trained a model with Scikit-Learn model
. In order to calculate the SHAP value, we need to initialize an KernelExplainer
object first:
import shap
explainer = shap.KernelExplainer(model.predict, X_train)
Then you can use shap_values
the method to calculate the SHAP value of each feature:
shap_values = explainer.shap_values(X_test)
In this way, we get the contribution value of each feature to each predicted sample.
1.2 Analyzing the model with SHAP values
The SHAP library provides some visualization methods to help us analyze the model more intuitively. For example, we can use the summary_plot method to plot the overall situation of SHAP values:
shap.summary_plot(shap_values, X_test)
This graph shows the SHAP value of each feature as a function of the feature value. From the figure, we can see that the degree of influence of different features on the prediction results is quite different.
Two: How does LIME reveal the local characteristics of the model?
LIME (Local Interpretable Model-Agnostic Explanations) is another way to explain machine learning models. Its main idea is to build a simple linear model around each predicted sample , which helps us understand the behavior of the model locally.
2.1 Using LIME to analyze the model
First, we need to install lime
the library:
!pip install lime
Suppose we have trained a model with Scikit-Learn model
. In order to use LIME, we need to create an LimeTabularExplainer
object first:
from lime.lime_tabular import LimeTabularExplainer
explainer = LimeTabularExplainer(X_train.values, feature_names=X_train.columns, class_names=['prediction'], verbose=True)
We can then generate a LIME explanation for a certain forecast sample:
i = 42 # 随便选一个样本
exp = explainer.explain_instance(X_test.values[i], model.predict_proba)
Finally, we can show_in_notebook
visualize the LIME interpretation with the method:
exp.show_in_notebook()
This way we can see a simple linear model showing the contribution of each feature to the prediction.
2.2 Limitations of LIME
Although LIME can help us understand the local behavior of the model, it also has some limitations. For example, LIME relies on a simple linear model and may not capture the properties of complex models well.
Three: Comparison of SHAP and LIME
Now that we have learned about the two libraries SHAP and LIME, it is natural to have a question: what is the difference between them, and how to choose?
3.1 Similarities and differences between the two
Let's start by summarizing their similarities:
-
can help us explain machine learning models;
-
Each feature can be assigned a contribution value;
-
Both support models in Scikit-Learn.
the difference:
-
SHAP is based on the Shapley value and has a certain theoretical basis;
-
LIME focuses on local characteristics and explains complex models with simple models;
-
SHAP can capture the interaction between features, while LIME cannot.
3.2 How to choose?
Although both SHAP and LIME have their own advantages and disadvantages, in general, SHAP is more theoretically based and can capture the interaction between features. Therefore, in most cases, we recommend using the SHAP library. But if you are more interested in local features, then LIME is also a good choice.
Technical summary
Through these methods, we can better understand the internal structure of the model, and then optimize the model and improve the prediction accuracy. Finally, welcome to leave a message in the comment area to share your insights and tell us how you use this knowledge to solve practical problems!