[Deep Learning] Advanced Practice of Causal Inference and Machine Learning | Mathematical Modeling

Article directory

  • causal inference
  • Causal inference of past and present lives
    • (1) Potential Outcome Framework
    • (2) Structural Causal Model (SCM)

Machine learning practitioners in the era of explosive growth of artificial intelligence are undoubtedly lucky. How to better integrate artificial intelligence into all aspects of human life is an important issue to be solved in this era. Teacher Wang Congying, a senior algorithm engineer at Didi International, found that many newcomers often memorize high-level model theories by heart when they first enter the industry, but they cannot figure out the way and grasp the key points when they are actually applied, resulting in good steel being useless. To the cutting edge, actual business gains cannot be achieved. It would be great if there was a technical book that could guide newcomers from entry to mastery, and from theory to practice. This would not only save the company the cost of training newcomers, but also leave space for newcomers to learn and grow on their own.

With this original intention in mind, Mr. Wang spent nearly a year in his spare time reviewing and summarizing the growth process and project experience of himself and his colleagues from Xiaobai to qualified algorithm engineers, and finally wrote " In this book "Advanced Practice of Machine Learning: Computational Advertising, Supply and Demand Forecasting, Intelligent Marketing, and Dynamic Pricing", I hope that through his experience, I can truly help readers who are interested in machine learning algorithms.Please add image description

《机器学习高级实践:计算广告、供需预测、智能营销、动态定价》

作者:王聪颖  谢志辉

causal inference

Insert image description here
Causal inference is an emerging branch in the field of machine learning in recent years. It mainly solves the problem of "which came first, the chicken or the egg". Therefore, the main difference between causal inference and correlation is: Causal inference attempts to infer the impact of variable X on the result Y through changes in variable X, while correlation focuses on expressing trend changes between variables, such as two There is a correlation between the two variable pictures. If the picture increases with the increase of the picture, it means that the picture and the picture are positively related. If the picture decreases with the increase of the picture, it means that the two are negatively related. Therefore, there is an essential difference between causality and correlation. In order to help readers better understand, here is an example:
A study shows that people who eat breakfast are more likely to eat breakfast than those who do not. People who eat breakfast weigh less, so "experts" conclude that eating breakfast can help you lose weight. But in fact, there may be a correlation between eating breakfast and being underweight, not causation. People who eat breakfast may have a series of healthy lifestyles such as regular meals, regular exercise, adequate sleep, etc., which ultimately lead to their lighter weight. Figure 1 shows the confounding factors in causal inference, describing the relationship between a healthy lifestyle, eating breakfast, and being underweight.
Please add image description
Obviously, people with a healthy lifestyle will eat breakfast, and a healthy lifestyle will also lead to light weight. It can be seen that a healthy lifestyle is the common cause of eating breakfast and being light weight. It is precisely because of the existence of such common reasons that we cannot easily conclude that there is a causal relationship between eating breakfast and being underweight, so we believe that the conclusions of the "experts" are hasty. There is only correlation between eating breakfast and weight loss, not causation, and the common cause that blocks the inference of causality is called a confounding factor. As shown on the right side of Figure 1, eliminating confounding factors, finding the causal relationship between two variables, and quantifying the degree of change in a certain independent variable X that affects the change in the dependent variable Y are the main contents of causal inference.

Causal inference of past and present lives

Insert image description here

(1) Potential Outcome Framework

Before introducing the potential results framework, first list two assumptions that need to be stated to describe the individual causal effects. In addition, it should be noted that in order to help everyone get started faster, this article only describes binary processing, that is, individuals only have There are two situations: accepting processing and not accepting processing, and corresponding results of the two processing methods.
Please add image description
However, in the real world, individual pictures either receive processing or do not receive processing at the same moment. It is impossible to receive processing and not receive processing at the same time. Therefore, the individual causal role is not identifiable, and the individual Pictures of observation data results

How to conduct causal inference when it is known that the causal role of an individual cannot be identified? Perhaps an effective solution is to transfer the identification of causal effects from individuals to the population, so there is average causal effect (ATE,
Average Treatment
Effect) concept. The average causal effect no longer compares the causal effects of individuals, but compares the potential results of two groups of groups under different treatments. In addition to receiving different treatments, the two groups must have homogeneous attributes. The average calculated in this way Causal effects can only be unbiased. Randomized controlled trials (RCT) are the basic experimental methods to ensure unbiasedness between two groups. The entire data is randomly divided into the experimental group (Treatment Group) and the control group (Control Group), where T=1 for the experimental group and T for the control group =0, then the formula of average causal effect is as follows:


Please add image description

where Y(1) and Y(0) are the results of the experimental group under the condition of receiving treatment and the results of the control group under the condition of not receiving treatment respectively. At this point, the basic theoretical knowledge of causal inference under the framework of potential results has been explained. It can be summarized in the following two points.
1) Randomized controlled trials ensure the homogeneity of groups.

2)Move from the unevaluable individual causal effect to the evaluation of the overall average causal effect.

(2) Structural Causal Model (SCM)

A directed acyclic graph is composed of nodes and directed edges. The upstream of the directed edge is the parent node, and the direction pointed by the directed edge is the child node. The parent node of a node in the DAG and its non-child nodes are independent. According to the total probability formula and conditional independence, the joint probability distribution of all nodes in a directed acyclic graph can be expressed as:
Insert image description here
The picture is the parent node of all the pictures pointing to it. In order to better help readers understand the joint distribution expression in the directed acyclic graph, a specific DAG example is given here, as shown in Figure 2. Please add image description
According to the conditional independence of the directed acyclic graph and the formula of the joint probability distribution, the joint distribution of Figure 2 can be expressed as: Please add image description
Each directed acyclic graph The graph produces a unique joint distribution, but a joint distribution does not necessarily correspond to a directed acyclic graph. For example, the joint probability distribution of a picture may be a picture or a picture of a graph structure, and the cause and effect of the two graph structures The relationship is completely opposite, which is why Bayesian networks are not suitable for causal models. In order to transform DAG into a causal graph that can express causal relationships, the do operator needs to be introduced. The do operator here expresses an intervention. The picture indicates that all the directed edges pointing to the node picture are cut off, and the node picture is assigned a constant value. After the do operator intervenes, the joint probability distribution of the DAG changes. Expressed in the following form:Please add image description
Please add image description
Please add image description
Please add image description

In the three path structures of chain, fork and anti-fork in Figure 3, A and C in the anti-fork structure are naturally independent of each other. B is also called a collider, chain or fork. Formula structure, with B as the condition, can block the association between A and C, thereby making A and C independent of each other. D-separation is a blocking operation on different path structures in order to achieve the purpose of variable independence. The specific d-separation rules are summarized as follows.
1) When there are two arrows on a certain path pointing to a certain variable at the same time, then this variable is called a collider, and this path is blocked by the collider.
2) If a path contains non-collider, then this path can be blocked when the non-collider is the condition.
3) When a certain path is conditioned on a collider, this path will not only not be blocked, but will be opened.

What needs to be noted here is that using a variable as a condition refers to specifying the value of a certain variable. For example, using the age variable as a condition means specifying age as 0 or 1.
After understanding that the d-separation rule can be blocked based on a certain variable to achieve independence between variables, the backdoor criterion can be combined to eliminate the causal diagram of the unknown structure caused by confounding factors. Causal inference was made. Before clarifying the backdoor guidelines, you need to understand the concepts of backdoor paths and frontdoor paths. The backdoor path from variable X to variable Y is the path that connects X to Y, but the arrow does not start from X. The corresponding front door path is the path that connects X to Y and the arrow starts from d-separation blocks all backdoor paths between X and Y, then we believe that the causal relationship from X to Y can be identified, and the factors that block the backdoor paths are called confounding factors. At this point, the method that knows the backdoor criterion does not need to observe all variables, but only needs to observe which variable can eliminate the backdoor path as a condition, so that the causal relationship between X and Y can be identified.
Insert image description here
Please add image description

Guess you like

Origin blog.csdn.net/Why_does_it_work/article/details/134613216