Cause and Effect 2-Potential Outcomes Framework

Cause and Effect 2-Potential Outcomes Framework

Abstract: How far do we need to go from causality to statistics?

In the last chapter, we started from the relationship between causality and statistics, and got a preliminary understanding of a classic causal framework: the potential outcome framework. Today we continue to study this framework.

Figure 1 is the data table of "Take medicine to cure headache" we left last week. How do we find the average treatment effect from this table?
image-20210331091746078.png

Figure 1. "Take medicine to cure headache" data

Let us assume that this is the data obtained in a randomized controlled experiment, that is, the statistical association equals the causal effect, and we can obtain ATE= 1 3 \frac{1}{3}31,as shown in picture 2.
[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-kZWA0gJi-1620387610539)(https://i.loli.net/2021/04/07/TxyILYtArNeXDP7.png )]

Figure 2. ATE solution 1

But according to the previous chapter, there are too few data that can satisfy randomized controlled trials, and the real observation data is full of confounding variables, E [ Y ( 1 ) ] − E [ Y ( 0 ) ] = E [ Y ∣ T = 1 ] − E [ Y ∣ T = 0 ] E[Y(1)]-E[Y (0)]=E[Y|T=1]-E[Y|T=0]E [ AND ( 1 ) ]E [ Y ( 0 ) ]=E[YT=1]E[YT=0 ] does not hold.

If you want to conduct observational research, you must break through this barrier so that the causal expression can be completely transformed into a statistical expression . We call this ability identifiability (identifiability), that is, if a causal quantity can be calculated through pure statistics , then the causal quantity is identifiable, which means that we can find the causal effect from the observation data.

Four hypotheses are proposed for this potential outcomes framework:

  • unconfoundedness
  • positivity
  • no interference
  • consistency

Let us understand these four assumptions step by step based on the derivation of the ATE formula.

First, start from the initial form of ATE. In fact, the simplest ATE= E [ Y ( 1 ) − Y ( 0 ) ] E[Y(1)-Y(0)]E [ AND ( 1 )Y ( 0 ) ] does not make sense at any time. For example, if we want to treat infectious diseases, this equation does not hold. At the same time, the treatment of sick individuals will not only affect the individual's own outcome variables, but also affect the Individuals without the disease, so it is not possible to calculate the causal effect using the individual patient as a unit.

No interference便是ATE= E [ Y ( 1 ) − Y ( 0 ) ] E[Y(1)-Y(0)]E [ AND ( 1 )Y ( 0 ) ] is the assumption behind the support.

No interference

No interaction effect, which is defined as:

Y i ( t 1 , . . . , t i − 1 , t i , t i + 1 , . . . , t n ) = Y i ( t i ) Y_i(t_1,...,t_{i-1},t_i,t_{i+1},...,t_n)=Yi(t_i) Yi(t1,...,ti1,ti,ti+1,...,tn)=Y i ( ti)

That is, there is no intervention between different values ​​of t. From the perspective of data, we can think that this definition is to ensure that T is a set, and the elements in T satisfy the three definitions of set elements, namely certainty, mutuality, disorder.

Only if this assumption is satisfied, ( Y ( 1 ) − Y ( 0 ) (Y(1)-Y(0)(Y(1)Y ( 0 ) makes sense. After the data satisfies the No interference assumption, we can define ATE as:

E [ Y ( 1 ) − Y ( 0 ) ] E[Y(1)-Y(0)]E [ AND ( 1 )Y(0)](No interference)

= E [ Y ( 1 ) ] − E [ Y ( 0 ) ] =E[Y(1)]-E[Y(0)]=E [ AND ( 1 ) ]E [ Y ( 0 ) ] (desired linear property)

Because our goal is an observational study, we introduce the confounding variable X,

= E x [ E [ Y ( 1 ) ∣ X ] − E [ Y ( 0 ) ∣ X ] ] =Ex[E[Y(1)|X]-E[Y(0)|X]] =Ex[E[Y(1)X]E [ Y ( 0 ) X ] ] (Double Expected Value Theorem)

At this point, the boundary between causality and statistics has been reached, and the next hypothesis is needed.

Unconfoundedness

Without confusion, the formula is:

( Y 1 , Y 0 ) ⊥  ⁣ ⁣ ⁣ ⊥ T ∣ X (Y_{1}, Y_0) {\perp \!\!\! \perp} T|X (Y1,Y0)TX

Before introducing him, let's introduce ignorability.

ignorability

Negligibility, also known as exchangeability (exchangeability). The formula is:

( Y 1 , Y 0 ) ⊥  ⁣ ⁣ ⁣ ⊥ T (Y_{1}, Y_0) {\perp \!\!\! \perp} T (Y1,Y0) T,即Y 1 , Y 0 Y_{1}, Y_0Y1,Y0The value of is independent of the distribution of T.

We can understand this assumption from three perspectives.

First, based on simple intuition, we can understand it as the condition that T is completely random in randomized controlled experiments .

Understand from the perspective of negligibility: we need to note that ( Y 1 , Y 0 ) ⊥ ⁣ ⁣ ⁣ ⊥ T (Y_{1}, Y_0) {\perp \!\!\! \perp} T(Y1,Y0)T Y ⊥  ⁣ ⁣ ⁣ ⊥ T Y {\perp \!\!\! \perp} T Y difference between T , Y ⊥ ⁣ ⁣ ⁣ ⊥ TY {\perp \!\!\! \perp} TY T means that Y and T are completely independent, even without causal effect, and( Y 1 , Y 0 ) ⊥ ⁣ ⁣ ⁣ ⊥ T (Y_{1}, Y_0) {\perp \!\!\! \perp}T(Y1,Y0) Y in T has determined the value of T, which means that the distribution of Y and the distribution of T values ​​are independent of each other, so wecan ignorethe distribution of T values, that is, assume that T is completely random. Of course, it can also be understood thatcan be ignored.

We can also understand it from the perspective of exchangeability , as shown in Figure 3. We can swap the values ​​of T at will, but the expectations of Y(1) and Y(0) will not change.

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-kjgvTQmB-1620387610541)(https://i.loli.net/2021/04/07/91Eut6VqBp8ScKs.png )]

Figure 3. Swapability

At the same time, according to the derivation in Figure 3, it can be found that as long as ignorability is satisfied, $ E[Y(1)]-E[Y (0)]=E[Y|T=1]-E[Y|T=0]$ etc. formula is established. But this assumption was made while ignoring confounding variables, which holds true in randomized controlled trials but does not meet the criteria for an observational study.

unconfoundedness

Therefore, we introduce confounding variables to obtain the conditional ignorability assumption, that is, Unconfoundedness, also known as conditional exchangeability, whose formula is:

( Y 1 , Y 0 ) ⊥  ⁣ ⁣ ⁣ ⊥ T ∣ X (Y_{1}, Y_0) {\perp \!\!\! \perp} T |X (Y1,Y0)TX

The formula is well understood, that is, the conditional distribution is found under the relevant confounding variable conditions ( condition on ), so that ( Y 1 , Y 0 ) (Y_{1}, Y_0)(Y1,Y0) andTTT is based on the conditional distributions of X being independent of each other.

But only with this assumption, we cannot further derive the formula

E [ Y ( 1 ) − Y ( 0 ) ] E[Y(1)-Y(0)]E [ AND ( 1 )Y(0)](No interference)

= E [ Y ( 1 ) ] − E [ Y ( 0 ) ] =E[Y(1)]-E[Y(0)]=E [ AND ( 1 ) ]E [ Y ( 0 ) ] (desired linear property)

= E x [ E [ Y ( 1 ) ∣ X ] − E [ Y ( 0 ) ∣ X ] ] =Ex[E[Y(1)|X]-E[Y(0)|X]] =Ex[E[Y(1)X]E [ Y ( 0 ) X ] ] (Double Expected Value Theorem), we still need another assumption, because as shown in Figure 4, we cannot guarantee that the denominator of the red line makes sense.

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-ld4HwL81-1620387610542)(https://i.loli.net/2021/04/07/ucwbEf852zkoF4p.png )]

Figure 4. Positivity

unconfoundedness is unmeasurable. We cannot confirm whether there are unobserved confounders.

Positivity

Translated as non-zero, but it doesn't feel intuitive enough. Also known as Overlap (overlap), defined as:

Given the value of all covariates x P ( X = x ) > 0 P(X=x)>0P(X=x)>0,都有 0 < P ( T = 1 ∣ X = x ) < 1 0<P(T=1|X=x)<1 0<P(T=1X=x)<1

In order to solve the problem in Figure 4, we need to define this assumption to ensure that the product of the denominator is always greater than 0, that is, to satisfy the positivity . Based on this assumption, the denominator of Figure 4 will always make sense. The practical significance of this assumption is also easy to understand. As shown in Figure 5, suppose that for a set of data, we want to pay attention to the causal effect of the X=x subgroup, but find that the X=x subgroup is all the data of the experimental group. Without a control group, no causal effect can be found.

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-VAkk9k0o-1620387610544)(https://i.loli.net/2021/04/07/mAwDefFCWv753oJ.png )]

Figure 5. The positivity intuition

We can also understand from the perspective of overlap, when considering the situation shown in Figure 5, the value range of P(X|T). When X=x, T is always only equal to 1. In other words, the value range of P(X|T=0) does not include X=x, so we can draw a histogram accordingly, and the horizontal axis represents the value of x The value range, the vertical axis represents P(x|t), as shown in Figure 6, (a) represents a complete violation of the positivity assumption, P ( X ∣ T = 0 ) P(X|T=0)P(XT=0 ) andP ( X ∣ T = 1 ) P(X|T=1)P(XT=1 ) The two distributions have no overlapping area, (b) means that the positivity assumption is partially complied with, that is, only the overlapping part satisfies the positivity assumption, (c) means that the positivity assumption is fully satisfied, and all the values ​​of X in the data satisfy the positivity.

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-Gg0tITpd-1620387610545)(https://i.loli.net/2021/04/07/ZDmHa1pGkOe6nNC.png )]

Figure 6. Overlap

The Positivity-Unconfoundedness Tradeoff

The trade-off between the positivity assumption and the unconfoundedness assumption.

Figure 6 reveals the violation of positivity when there is only one confounding variable X. Let's see what happens to positivity when the number of confounding variables X is gradually increased, as shown in Figure 7.
[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-w8DtPAYO-1620387610546)(https://i.loli.net/2021/04/07/BLpHU9RhZiVIzWd.png )]

Figure 7. Positivity and unconfoundedness

It can be found that as the number of X increases, the possibility of the positivity hypothesis being established becomes smaller. But it happens that the more X we control, the higher the possibility of satisfying unconfoundedness, which shows that it is difficult to have both positivity and unconfoundedness, and we often need to make a trade-off between the two in experiments.

Well, after introducing the positivity assumption, we can further push the formula

E [ Y ( 1 ) − Y ( 0 ) ] E[Y(1)-Y(0)]E [ AND ( 1 )Y(0)](No interference)

= E [ Y ( 1 ) ] − E [ Y ( 0 ) ] =E[Y(1)]-E[Y(0)]=E [ AND ( 1 ) ]E [ Y ( 0 ) ] (desired linear property)

= E x [ E [ Y ( 1 ) ∣ X ] − E [ Y ( 0 ) ∣ X ] ] =Ex[E[Y(1)|X]-E[Y(0)|X]] =Ex[E[Y(1)X]E [ Y ( 0 ) X ] ] (Double Expected Value Theorem)

= E x [ E [ Y ( 1 ) ∣ T = 1 , X ] − E [ Y ( 0 ) ∣ T = 0 , X ] ] =Ex[E[Y(1)|T=1,X]-E[Y(0)|T=0,X]] =Ex[E[Y(1)T=1,X]E [ Y ( 0 ) T=0,X]](unconfoundedness和positivity)

There is one last step left here, the conversion of Y(1)=>Y, let us look at the last assumption.

Consistency

Consistency, definition:

If treatment is T, then observation YYY is to accept processingTTPotential outcomes of T , i.e. T = t T = tT=t,则 Y = Y ( t ) Y=Y(t) Y=Y ( t ) , or can be directly written asY = Y ( T ) Y=Y(T)Y=Y(T)

The definition may be convoluted, but we simply understand it. In fact, it means that for any t, there is only one Y value. For example, if we define T as drinking water and Y as immediate death, study whether drinking water will cause people to die immediately. Then let some people drink boiled water and some people drink poisonous water. We will find that for T=1, Y can take two values ​​of 0 and 1. In this case, consistency is not satisfied.

Combined with no interference, from a mathematical point of view, we say that when the two assumptions of no interference and consistency are satisfied, T to Y satisfies the mapping relationship. It can also be said that Y is a function of T. We also say that when Y is a function of T , it satisfies SUTVA (stable unit-treatment value assumption, individual treatment stability assumption).

Adjustment Formula

Adjust the formula.

Based on the consistency assumption, we can completely build a bridge from causality to statistics, and calculate the causal effect from the observed data. The formula:

E [ Y ( 1 ) − Y ( 0 ) ] E[Y(1)-Y(0)]E [ AND ( 1 )Y(0)](No interference)

= E [ Y ( 1 ) ] − E [ Y ( 0 ) ] =E[Y(1)]-E[Y(0)]=E [ AND ( 1 ) ]E [ Y ( 0 ) ] (desired linear property)

= E x [ E [ Y ( 1 ) ∣ X ] − E [ Y ( 0 ) ∣ X ] ] =Ex[E[Y(1)|X]-E[Y(0)|X]] =Ex[E[Y(1)X]E [ Y ( 0 ) X ] ] (Double Expected Value Theorem)

= E x [ E [ Y ( 1 ) ∣ T = 1 , X ] − E [ Y ( 0 ) ∣ T = 0 , X ] ] =Ex[E[Y(1)|T=1,X]-E[Y(0)|T=0,X]] =Ex[E[Y(1)T=1,X]E [ Y ( 0 ) T=0,X]](unconfoundedness和positivity)

= E x [ E [ Y ∣ T = 1 , X ] − E [ Y ∣ T = 0 , X ] ] =Ex[E[Y|T=1,X]-E[Y|T=0,X]] =E x [ E [ Y T=1,X]E[YT=0,X]](consistency)

We call this formula that builds the bridge from cause and effect to statistics as Adjustment Formula (adjustment formula) or identification of ATE (ATE identification formula):

E [ Y ( 1 ) − Y ( 0 ) ] = E x [ E [ Y ∣ T = 1 , X ] − E [ Y ∣ T = 0 , X ] ] E[Y(1)-Y(0)]=Ex[E[Y|T=1,X]-E[Y|T=0,X]] E [ AND ( 1 )Y(0)]=E x [ E [ Y T=1,X]E[YT=0,X]]

Okay, this chapter is over here. The content of this chapter is very important. This is the first model we have learned to seek causal effects. The formulas seem to be many, but they are all mathematical derivations at the junior high school level. Hope Students who are interested can follow the adjustment formula to go through their thinking: As shown in Figure 8, we obtain statistics by identifying causal quantities, and then obtain statistical results through statistics. [External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-5tEwN3rN-1620387610547)(https://i.loli.net/2021/04/07/1OenNGbFMxsdDXf.jpg)]

Figure 8. Causal inference process

Of course, it doesn’t matter if the students feel that they are still a little cloudy. Starting from the next chapter, we will introduce a more intuitive framework-the cause-and-effect diagram .

Reference

Introduction to Causal Inference

Guess you like

Origin blog.csdn.net/euzmin/article/details/116500882