Talk about machine learning models interpretability

  With the development of AI and machine learning, more and more decisions will be automated machine learning algorithms to do. But when we put some very important decisions to machines, we really worry what? When Boeing ignore the driver's command, the decision of turning back toward the earth; when the banking system inexplicably rejected your loan application time; when automation IFF weapon system to decide when innocent civilians opened fire; the human mind should be ten thousand mud horse fly, loudly asked, "Why?"
  
   ml Black boxâ ???? ???? ???? å ç ?? ¾ç ???? ???? ç'Æ ¢ ç »Æ ?? ????
  
  machine learning algorithm can be seen as a black box model shown above, the training data flows into the black box, a training function (this function can also be called a model), the input data to the new function returns forecast result. About model interpretability, it is to answer the question why, how to interpret the function, how it is predictable?
  
  Interpretable model
  
  in many machine learning algorithms, some model's hard to explain, for example, the depth of the neural network. Depth neural network can fit highly complex data, with vast amounts of parameters, but these are very difficult to explain how. But there are still a considerable part of the algorithm is relatively easy to explain.
  
  Such as linear regression:
  
   ???? 线 æ ?? §å ???? å½ ?? â ???? ç ???? å ?? ¾ç ???? æ ???? ç'¢ ç »?? æ ????
  
  linear regression goals wherein the relationship between the Y and X are as shown in formula FIG. So the explanation is simple linear regression model, a particular feature for Xi, each additional unit, Y target increases βi.
  
  Linear regression easy to use, can be guaranteed to find the optimal solution. But after all, not all problems are linear.
  
  Examples of a further interpretable model is a decision tree.
  
  Decision tree with artificial data. Instances with a value greater than 3 for feature x1 end up in node 5. All other instances are assigned to node 3 or node 4, depending on whether values of feature x2 exceed 1.
  
  decision tree of FIG above in the example shown, the decision tree is given a clear basis for prediction. To explain how to predict very simple decision tree, from the root to start, began to branch in accordance with all of the features, until reaching the leaf node, find the final prediction.
  
  A good tree can capture the interaction and dependencies between features. Tree structure can be well visualized. However, for linear decision tree processing is difficult, he is not smooth enough, not stable, a small change can alter the characteristic data to build the entire tree. When the tree hierarchy of nodes and become big time, to explain the whole decision-making process becomes correspondingly difficult.
  
  There are some other interpretable model, such as logic regression, general linear models, naive Bayes, K close to, and the like.
  
  Model-independent method for
  
  the type of interpretable model is limited, we hope to find some way to provide any explanation for the black box machine learning models. Here it is necessary not related to the methods and models.
  
  Dependence Plot the Partial (the PDP)
  
  the PDP represents a model for the influence of one or two features of the prediction result.
  
  FIG as a PDP FIG reaction temperature wherein the three (note that this is three PDP, PDP is assumed that each feature is independent), humidity, and wind speed for the number of cycling travel. Each FIG is assumed that the trend in the case of other features unchanged.
  
  FIG PDP very intuitive and easy to understand, easy to compute generated. The reaction but only up to two features of FIG PDP, because more than three dimensions can not be represented by FIG current technology. Meanwhile independence assumption is the biggest problem of the PDP.
  
  The Conditional Expectation Individual (the ICE)
  
  the ICE shown for each sample instance, when changing a feature of a variety of time, how to change the prediction result.
  
  As shown above, the PDP and reflects a consistent trend, but includes all samples.
  
  PDP and similar ICE independence assumption and not more than two features characterize all his limitations. At the same time with increasing number of samples, the map will become quite crowded.
  
  Wherein interaction (feature interaction)
  
  shown above in FIG reflects the feature interaction, for example, a model has two characteristics, the model may be a constant + containing only the first feature comprises only the second term + term + feature two features of the interaction terms. Use Friedman's H-statistic theory, we can calculate the feature interaction.
  
  Using H-statistic calculation is resource intensive, the result is not very stable.
  
  Feature Importance characterized importance
  
  defining feature of importance is that when changing a characteristic value, the prediction error for the changes brought about. How to understand it? When we change a feature, prediction error has undergone great changes, indicating that the feature has a great influence, but on the contrary, if the change in the value of another feature, there is no effect on the result of the prediction error, it means this feature It does not matter.
  
  The figure is an illustration of a feature of importance.
  
  Features the importance of providing a high-level overview of insight into the model, it contains all the features of the interaction, the importance of computing features do not need to retrain the model. This value is calculated required data contains real results.
  
  Shapley Values
  
  Shapley value is a very interesting tool, he assumed that each feature is like a player in the game, each player has to predict the results of a contribution. For each predictor, Shapley value gives the contribution of each feature to this prediction result.
  
  â ???? Shapley Valuesâ ???? ç ???? å ?? ¾ç ???? æ ???? ç'¢ ç »?? æ ????
  
  following figure shows an example of Shapley Value.
  
  Shapley provides a complete explanation of each feature. But the same waste of computing resources, and requires the use of all features.
  
  Alternatively model (Surrogate Model)
  
  the substitute model is a simple model with a more interpretable, and the input black box prediction model training an alternative, to use this model to interpret complex black box model.
  
  Surrogate model training process is as follows:
  
  selecting a data set X (training set and can be the same or different, it does not matter)
  
  with a black-box model trained to predict the Y
  
  select interpretable model, such as linear regression or the tree
  
  data before using X and Y set to train the predictive model could explain
  
  authentication model and the difference between the black box model can explain the
  
  surrogate model is very flexible, very intuitive and easy to implement. But interpretation of surrogate model is a black box model, and not for the interpretation of the data.
  
  Based on interpretation of sample
  
  counterfactual interpretation (Counterfactual)
  
  counterfactual interpretation is like saying, "If X does not happen, Y will not happen"
  
  counterfactual interpretation establish a causal relationship characteristics and predict results. As shown in FIG.
  
  We changed a sample of a feature, and then observe the predicted results. what if google's tool that can help us to do this analysis.
  
  另外推荐这本书:The Book of Why : The New Science of Cause and Effect
  
  The Book of Why
  
  <?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.yifa5yl.com /2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0www.baitengpt.com</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.1.6.RELEASE</version>
<relativePath/> <!-www.yongshiyule178.com- lookup parent from repository -->
</parent>
<groupId>com.springcloud</groupId>
<artifactId>producer<www.huichengtxgw.com /artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>producer</name>
<description>Demo project for Spring Boot</description>

<properties>
<java.version>1.8</java.version>
<spring-cloud.version>Greenwich.SR1</spring-cloud.version>
</properties>

<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-netflix-eureka-client</artifactId>
</dependency>

<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>

<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>${spring-www.baiyiyuLet.com cloud.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>

<Build>
<plugins>
<plugin>
<the groupId> org.springframework.boot <www.wanhongzpt.com / the groupId>
<the artifactId> Boot-Spring-Maven-plugin </ the artifactId>
</ plugin>
</ plugins>
</ build>
  
  against the sample means that when making a minor change to a certain characteristic value of a sample and makes the whole model made a wrong forecast. Against the target sample is deceiving model, they mean hacking machine learning models often is to find these confrontations samples.
  
  Prototypes and criticism (Prototypes and Criticisms)
  
  prototype is a data point, it can represent all the other points. And a means that can not be criticized point a set of data points representative of the effective prototype.
  
  Influential examples (Influential Instances)
  
  model of machine learning is the training output data, delete any training data tends to affect training results. If you delete one of the training data have a huge impact on the hungry model, then we say this is the point influential. Analysis of the influential point also can often help us to explain the model.
  
  Summary
  
  paper introduces the basic concepts and methods interpretable machine learning, we want to help small partners.

Guess you like

Origin www.cnblogs.com/qwangxiao/p/11124310.html