The concept of causal inference is further refined and sorted out

0x01. Background

I read a review of causal inference in the early stage, and after reading it, I briefly sorted it out. There were also some methods and details that I was not familiar with, so I sorted it out again, and rearranged the concepts that were not well understood.

0x02. Review the basic concepts

  • unit: The atomic research object in causal reasoning, which can be a real object or a concept, and can be one or more. In some frameworks, the same object at different times is considered to be different units.
  • treatment: The operation applied to the unit. Also known as intervention, intervention, etc.
  • variables: Some attributes that come with the unit, such as the patient's age, gender, medical history, blood pressure, etc. The variables that are not affected during the treatment process are called pre-treatment variables. For example, the gender of the patient is unchanged in most cases; correspondingly, the affected variables are called post-treatment variables.
  • Confounder: Some variables that affect treatment selection and outcome. For example, the results of the same dose of medicine may be different for people of different ages , or the choice of medicines for different ages may be different. In some literature, it is also called covariate, covariate.
  • potential outcome: All possible outcomes of an operation applied to an object. Contains observed outcome and counterfactual results.
  • Factual outcome: The final observed result of the operation applied to the object, denoted as Y.
  • propensity score: propensity score. e ( x ) = P r ( W i = 1 ∣ X i = x ) e(x) = Pr(W_i = 1|X_i = x)and ( x )=P r ( Wi=1∣Xi=x ) , reflecting the samplexxx选择 t r e a t m e n t treatment t re a t m e n t possibility.
  • selection bias: selection bias. Thanks to C onfounder ConfounderC o n f o u n d er existence,treatment treatmentt re a t m e n t group andcontrol controlThe distribution of control groups may not be consistent, thus leading to bias, which also makes inference more difficult .
  • pre-treatment variables: Variables that are not affected by the intervention, also called background variables, such as the weather for taking medicine.
  • post-treatment variables: variables affected by the intervention, such as appetite in the problem of taking medicine.

ATE(Average Treatment Effect): A T E = E [ Y ( W = 1 ) ] − E [ T ( W = 0 ) ] ATE = E[Y(W=1)] - E[T(W=0)] ATE=E[Y(W=1)]E[T(W=0 )] , the average treatment effect for all people.

ATT(Average Treatment Effect on the Treated group): A T T = E [ Y ( W = 1 ) ∣ W = 1 ] − E [ Y ( W = 0 ) ∣ W = 0 ] ATT=E[Y(W=1)|W=1] -E[Y(W=0)|W=0] A TT=E[Y(W=1)W=1]E[Y(W=0)W=0 ] , the average treatment effect for the treated population.

ATC (Average Treatment Effect on the Controlled group): Similar to ATT, it is the average intervention effect of the control group.

ITE(Individual Treatment Effect): I T E i ​ = Y i ​ ( W = 1 ) − Y i ​ ( W = 0 ) ITE_i​=Y_i​(W=1)−Y_i​(W=0) ITEi=Yi(W=1)Yi(W=0 ) , the treatment effect of a certain unit (individual dimension).

CATE(Conditional Average Treatment Effect): C A T E = E [ Y ( W = 1 ) ∣ X = x ] − E [ Y ( W = 0 ) ∣ X = x ] CATE=E[Y(W=1)∣X=x]−E[Y(W=0)∣X=x] CATE=E[Y(W=1)X=x]E[Y(W=0)X=x ] , in the featureX = x X = xX=The treatment effect in the subpopulation of x .

0x03. Simpson's Paradox

I only knew that Simpson's paradox is a phenomenon that seems to have an advantage, but it is lagging in the overall evaluation. After re-understanding, I realized that Simpson’s paradox is one 统计学中的名词. It probably means that in order to explore the correlation between two variables, people will study the variables in groups. When it comes to general comments, sometimes it is the loser.

Here is an example from Zhihu:

100 men 100 women
Apply admission Acceptance rate Apply admission Acceptance rate
Engineering 80 38 47.5% 20 14 70%
liberal arts 20 2 10% 80 16 20%
total 100 40 40% 100 30 30%

The number of applicants, the number of admissions and the admission rate of boys and girls in the two colleges are counted here. The implicit Simpson paradox here is: whether it is the School of Physics or the School of Arts, the individual admission rate of boys is lower than that of girls, but the total admission rate is calculated. The rate of boys is much higher than that of girls. When the admission results came out, people talked about whether the school discriminated against girls, but when you look at the admission rates of different groups, you will find that it discriminates against boys? Who is it that discriminates against?

Why does this happen?

The essence of Simpson's paradox is, or the premise is, 每层之间的成功率差别很大. : In the admission rate data, whether it is male or female, the admission rate of the College of Engineering is much higher than that of the College of Liberal Arts (this may be the reason for the school's major allocation, the College of Engineering is better, it is not as good as the liberal arts);

It is easier to take the exam in the School of Engineering, and in summary, it is done 更容易做的事. Observing the data, it is found that different batches of people assign different people to do different difficult tasks, which is also the second premise of the paradox. 更多男生申请了工科学院(选择更容易的事去做),导致最终统计成功率时出现反转.

To sum up: if you do more things with a high success rate, your overall success rate will increase .

Quoting again from Zhihu’s geometric explanation:
insert image description here
We use the abscissa to represent doing something 尝试的次数, and the ordinate to represent 成功次数, each event can be represented by a point in the Cartesian coordinate system, draw a vector pointing to this coordinate point, the vector The slope of 成功率. Suppose a person does two things, the success rate of one thing is recorded as k1, and the success rate of the other thing is recorded as k2; then how to calculate the overall success rate? Very simple, overall success rate = total number of successes/total number of attempts, geometrically, according to the parallelogram rule, make a parallelogram with these two vectors as adjacent sides, and the slope of the diagonal of this parallelogram is the overall success rate.

This can explain why a person's success rate on both levels is not high (relative), but the overall success rate is high.
insert image description here
A无论做第一件事还是第二件事,成功率都小于B,但是计算总的成功率却大于B。

Enlightenment of Simpson's paradox: In order to avoid the appearance of Simpson's paradox, 需要斟酌个别分组的权重a certain coefficient is used to eliminate the influence caused by grouping and stratifying differences. While the data is objective and factual, different people can tell different stories with the same data.

0x04. Continue to update

Guess you like

Origin blog.csdn.net/l8947943/article/details/129808453