DPSS quant 1.1 Causality University of Chicago Uchicago Summer School Mathematical Statistics

1.1 Causality

1.1.1 Motivation

It is important to clarify the causal relationship between data for explaining and guiding the real world

1.1.1.1 Examples

In 1665, the plague broke out in London. It was when people speculated that the cause of the plague was related to animals. As a result, more than 40,000 dogs and five times more cats were killed.

The data at the time showed that the number of cats was positively correlated with the number of deaths from the Black Death, and smoke was negatively correlated with the number of deaths from the Black Death

Insert picture description here

It seems that killing cats is not unfounded, but the facts are like this

Insert picture description here

The number of deaths from the Black Death is higher in places where there are rats and vines, and because there are rats, there are more cats here! Killing cats will only increase the number of deaths!

1.1.1.2 Examples

In 1854, London doctor John Snow predicted that the water source was contaminated by analyzing the distribution of patients, and successfully avoided the further expansion of the infection

1.1.2 Rubin Model

For a given object i and an interference (intervention) t, the stateful world S, S ∈ {t, c} S,S\in \{t,c\} for iS,S{ t,c } , where t and c correspond to the experiment (interference is applied) and the control (interference is not applied) respectively, and the corresponding result (outcome by the measure of interest) is recorded asYYY , use the subscript i to correspond to the object, and use the superscript S to indicate whether to interfere.

The so-called causal effect (Causal Effect) is the two potential results ( Y it, Y ic Y_i^t, Y_i^cANDit,ANDic)的不同 Y i t − Y i c = T i Y_i^t-Y_i^c=T_i ANDitANDic=Ti, The TE (treatment effect) of t for c is also Y it − Y ic = T i Y_i^t-Y_i^c=T_iANDitANDic=Ti

However, there is a core problem (Fundmental problem of causal inference)-you can only see one of the potential results, it is impossible to observe both results in the real world at the same time! (This means that the TE of any object is unknowable)

1.1.2.1 Estimator

Because for any given iii T i T_i TiAre unknowable, so we give up calculating each specific T i T_iTi(intervention TE on i), and then calculate ATE (average treatment effect, average experimental effect)
T = E (Y it − Y ic) = E (T i) T=E(Y_i^t-Y_i^c) =E(T_i)T=E ( YitANDic)=E(Ti)
(This formula assumesT i T_iTiIs a simple random sample or corresponding random variable corresponding to the research object)

T is also unobservable, but in reality we can estimate
T ^ = E ^ (Y it ∣ S = t) − E ^ (Y ic ∣ S = c) \hat T=\hat E(Y_i^t|S= t)-\hat E(Y_i^c|S=c)T^=E^ (AnditS=t)E^ (AndicS=c ) Is
$\hat Tone or notIs No is Yi Ge T $ good (assignment mechanism) estimate depends on sampling mode

If the following conditions are met, it can be considered T ^ \hat TT^ Estimate
E (Y it) = E (Y it ∣ S = t) E (Y ic) = E (Y it ∣ S = c) E (Y_i ^ t) = E (Y_i ^ t | S = t) \\ E (Y_i ^ c) = E (Y_i ^ t | S = c)E ( Yit)=E ( YitS=t)E ( Yic)=E ( YitS=c )
At this time
E (T ^) = TE(\hat T)=TE(T^)=T
When S and Y are independent of each other, the above conditions will be met (Note: a completely random allocation of S seems reasonable, but it cannot be verified)

The above formulas are all from ppt. It is easy to find that the formulas are not detailed in detail, such as E (E ^ (Y it ∣ S = t)) E(\hat E(Y_i^t|S=t))E(E^ (AnditS=t ) ) is determined to be equal toE (Y it ∣ S = t) E(Y_i^t|S=t)due to some unknown good estimation methodE ( YitS=t) i i i is regarded as anobjectobject,但 T i T_i TiIs treated as a sample again, which also caused confusion in the formula

1.1.2.2 Assignment Machanism

Rubin‘s “Perfect Doctor”

I am studying a group of patients with a certain disease. There is an operation method for this disease, which is beneficial to some people but harmful to others. There is a perfect doctor who can understand the two types of surgery and non-operation. In the future, it will always operate on people who need surgery and not on people who don’t need surgery.

Insert picture description here
After the experiment, we get the following results
Insert picture description here

From the first table, we find that the average effect of this operation is poor (average life loss after 1 year), but according to the second table, we estimate T, and the result is that the operation is positive for the patient on average. Impact (average life extension after 3 years)

This doctor always adapts to local conditions and imposes the best state on specific patients

In real life, we often have knowledge about actual social problems (for example, we often know the income and consumption of the recipients before relief). This knowledge helps us to better implement certain interventions, and it also makes Our estimates in the average sense can be very biased.

1.1.3 Non-Causal Relationships

Non-causal relationship is often very important, we will learn the mathematical statistics method to estimate non-causal relationship later

Generally speaking, we always want to figure out the relationship between variables first, and then we try to figure out the essence behind the appearance, which may be cause and effect

  • For example, in the case of the Black Death, we first discovered the positive correlation between cats and the number of deaths from the Black Death, and then found the causal relationship between the two—that is, the influence chain of cat-rat-flea-plague-death

1.1.4 Conclusion

  • Not all data relationships are causal relationships

  • Even if there is a causal relationship, a reasonable sampling method is needed to overcome the basic problem of causal inference (that is, two future problems cannot be observed at the same time)

  • Look at the problem objectively-people often try to forcibly explain the data and impose causality on the data, which is often unreasonable! (For example, cats are labeled as spreading plague, which is outrageous)

Guess you like

Origin blog.csdn.net/Kaiser_syndrom/article/details/108572986