Interpretation of machine learning models: Scikit-Learn's SHAP and LIME libraries


overview

The "black box" dilemma of machine learning models

The rise of machine learning models has taken us by surprise! Whether it's predicting house prices, identifying cats and dogs in pictures, or recommending your favorite music, these models have performed extremely well. But, have you ever wondered how exactly these models make these decisions?

As a Python enthusiast, we naturally hope to understand the principles behind the model. The good news is that two libraries, SHAP and LIME, can help us! They can help us reveal the internal structure of the model, allowing us to better understand and optimize the model.


One: What exactly is the SHAP value?

SHAP (SHapley Additive exPlanations) is a method for explaining machine learning models based on Shapley values ​​from game theory . The core idea of ​​the Shapley value is to assign a contribution value to each feature to indicate the influence of the feature on the prediction result.

1.1 Calculation method of SHAP value

First, we need to install shapthe library:

!pip install shap

Suppose we have trained a model with Scikit-Learn model. In order to calculate the SHAP value, we need to initialize an KernelExplainerobject first:

import shap

explainer = shap.KernelExplainer(model.predict, X_train)

Then you can use shap_valuesthe method to calculate the SHAP value of each feature:

shap_values = explainer.shap_values(X_test)

In this way, we get the contribution value of each feature to each predicted sample.

1.2 Analyzing the model with SHAP values

The SHAP library provides some visualization methods to help us analyze the model more intuitively. For example, we can use the summary_plot method to plot the overall situation of SHAP values:

shap.summary_plot(shap_values, X_test)

This graph shows the SHAP value of each feature as a function of the feature value. From the figure, we can see that the degree of influence of different features on the prediction results is quite different.

Two: How does LIME reveal the local characteristics of the model?

LIME (Local Interpretable Model-Agnostic Explanations) is another way to explain machine learning models. Its main idea is to build a simple linear model around each predicted sample , which helps us understand the behavior of the model locally.

2.1 Using LIME to analyze the model

First, we need to install limethe library:

!pip install lime

Suppose we have trained a model with Scikit-Learn model. In order to use LIME, we need to create an LimeTabularExplainerobject first:

from lime.lime_tabular import LimeTabularExplainer

explainer = LimeTabularExplainer(X_train.values, feature_names=X_train.columns, class_names=['prediction'], verbose=True)

We can then generate a LIME explanation for a certain forecast sample:

i = 42  # 随便选一个样本
exp = explainer.explain_instance(X_test.values[i], model.predict_proba)

Finally, we can show_in_notebookvisualize the LIME interpretation with the method:

exp.show_in_notebook()

This way we can see a simple linear model showing the contribution of each feature to the prediction.

2.2 Limitations of LIME

Although LIME can help us understand the local behavior of the model, it also has some limitations. For example, LIME relies on a simple linear model and may not capture the properties of complex models well.

Three: Comparison of SHAP and LIME

Now that we have learned about the two libraries SHAP and LIME, it is natural to have a question: what is the difference between them, and how to choose?

3.1 Similarities and differences between the two

Let's start by summarizing their similarities:

  1. can help us explain machine learning models;

  2. Each feature can be assigned a contribution value;

  3. Both support models in Scikit-Learn.

the difference:

  1. SHAP is based on the Shapley value and has a certain theoretical basis;

  2. LIME focuses on local characteristics and explains complex models with simple models;

  3. SHAP can capture the interaction between features, while LIME cannot.

3.2 How to choose?

Although both SHAP and LIME have their own advantages and disadvantages, in general, SHAP is more theoretically based and can capture the interaction between features. Therefore, in most cases, we recommend using the SHAP library. But if you are more interested in local features, then LIME is also a good choice.

Technical summary

Through these methods, we can better understand the internal structure of the model, and then optimize the model and improve the prediction accuracy. Finally, welcome to leave a message in the comment area to share your insights and tell us how you use this knowledge to solve practical problems!

Guess you like

Origin blog.csdn.net/Rocky006/article/details/132488930