What the hell is the synthetic control method? A guide to cutting-edge methods of causal inference

What the hell is the synthetic control method? A guide to cutting-edge methods of causal inference

Compensated contributions can be made to the econometrics economy circle, and measurement related can be

Email: [email protected]

All the do files, micro-databases and various software of the econometric circle methodology are placed in the community. Welcome to the causal inference research group for exchanges and visits. If you want a complete do file and data set, please see how to get it after the article.

What the hell is the synthetic control method? A guide to cutting-edge methods of causal inference

Today, our "Causal Inference Research Group" will recommend a more frequently used causal inference method "synthetic control method" (SCM) for friends in the econometric circle. The most common example we hear is the anti-smoking bill implemented in California in 1989, hoping to see if the bill reduces tobacco consumption. Because this anti-smoking bill only has policy effects within California, the traditional DID method is not so easy to use, because the experimental group here has only one member-California.

For this kind of problem, we urgently want to obtain the causal relationship, so the synthetic control method appeared and was quickly promoted. The basic idea of ​​SCM is to use the weighted average of 38 other states that have not implemented the anti-smoking bill to synthesize a "California", and identify the policy by comparing the difference in cigarette consumption between the real California and the synthesized California after the anti-smoking bill in 1989. effect.

In the synthetic control method, some key variables are more important, so it is worth mentioning them separately. The following is a synthetic control method that we used a smoking data set to control the impact of California's 1989 smoking ban on the state's cigarette consumption (sales).

synth cigsale beer(1984(1)1988) lnincome retprice age15to24 cigsale(1988) cigsale(1980) cigsale(1975), trunit(3) trperiod(1989) fig replace keep(resout)

Explanation of some variables in the above program
1. Dependent variable y: cigsale (per capita cigarette sales, packs/year;
2. Independent variable x: beer (per capita beer consumption), lnincome (per capita GDP), retprice (cigarette retail price) , age15to24 (proportion of population aged 15-24), cigsale(1988), cigsale(1980), cigsale(1975) are the cigarette sales per capita in 1988, 1980 and 1975;
3.trunit(3): our policy impact Group (California), 3=california in the data;
4.trperiod(1989): the smoking ban act was implemented in 1989;
5.fig: display the composite control chart;

6.keep(resout): output the final synthesis result to the directory

We display the result data generated after SCM regression. From this, we can see that the synthetic per capita cigarette consumption in California is a combination of the per capita cigarette consumption in other control states through a certain weight coefficient. We see that the weight given by the colorado state is 0.285 (the largest), because this state and our policy-affected state (California) are relatively close to the characteristic variable x before the start of our policy-representing the colorado state is more like a policy influence California in the group.

Note: In essence, the process of SCM searching for the optimal weighting coefficient before policy implementation actually uses the "Matching" method that we often used before. In the process of finding the optimal weight, it is equivalent to constructing a |Y1t(treated)-W*Yit(scm)| distance, and then let our predicted value appear the smallest mean square error (MSPE) during the prediction process.

What the hell is the synthetic control method? A guide to cutting-edge methods of causal inference

When the synthetic control method is not used, let's compare the time trend graphs of per capita cigarette consumption in California and other states (the values ​​in other states are average values). Obviously, before the anti-smoking bill was introduced in 1989, there was a big gap between the two. At this time, it is impossible for us to identify the cause and effect of noiselessness. Therefore, we need to use matching methods to ensure that there is almost no difference between our control group and the policy influence group before 1989, so that we can make a clear judgment on the subsequent policy effects.

What the hell is the synthetic control method? A guide to cutting-edge methods of causal inference

The following is the picture we are most concerned about, because he can tell us most intuitively that before the implementation of the policy, the per capita cigarette consumption in the synthesized California almost perfectly coincides with the actual per capita cigarette consumption in California. Once the per capita cigarette consumption of the synthesized California and the actual California were more consistent before the policy was implemented in 1989, the divergence between the two after 1989 means the manifestation of the policy effect.

The figure clearly shows that after the 1989 smoking ban, the actual per capita cigarette consumption in California is much lower than the synthetic per capita cigarette consumption in California, and there is a trend of continuous expansion over time. This means that the smoking ban has indeed exerted a huge restraint on per capita cigarette consumption in California.

What the hell is the synthetic control method? A guide to cutting-edge methods of causal inference

The above program does not fully display the scm, let us further add two new options: xperiod and nested. For the detailed uses of these two options, we list them below.

synth cigsale beer lnincome retprice age15to24 cigsale(1988) cigsale(1980) cigsale(1975) , trunit(3) trperiod(1989) xperiod(1980(1)1988) nested

Interpretation of some new variables in the above program
1.xperiod(1980(1)1988): take the explanatory variables beer, lnincome, retprice, age15to24 as the average value of 1980, 1981,...1988;
2. nested: nested , Can help us find the optimal fit, at the cost of long calculations

In the SCM process, the results below are more important, because we can know a lot of important information, such as the control group, independent variables, and the year used for prediction.

What the hell is the synthetic control method? A guide to cutting-edge methods of causal inference

The following figure is the placebo test we did: it is to do an SCM for each of these 39 states (including California), and then obtain the policy treatment effect (the actual per capita cigarette consumption minus combined cigarette consumption). The yellow line in this picture is the policy effect obtained by California as a policy influence group. From the figure, we can see that before 1989, the actual cigsale and the synthesized cigsale were almost the same (gap was around 0).

However, for the gap that uses the other 38 states as the policy influence group before 1989, the gap that did not use California as the policy influence group is small, and California as the policy influence group has a clear downward trend after 1989, which has to let us Thinking about whether there is any policy effect, so the placebo test means that our California anti-smoking bill is effective.

What the hell is the synthetic control method? A guide to cutting-edge methods of causal inference

During the period 1970-1989 (before the emergence of policy effects), when the other 38 states were regarded as the policy influence group, the MSPE obtained was 20 times higher than the MSPE obtained when California was the policy influence group. The so-called "outliers" were eliminated. At the time, we will get the following policy effect trend chart that does not seem so messy. However, this yellow line is still extraordinary, because before the policy was introduced in 1989, the synthetic control group and the policy effect influence group were almost the same, and after the policy was introduced, the two had a downward trend, thus indicating that California's anti-smoking policy The promulgation of the government has indeed curbed per capita cigarette consumption.

What the hell is the synthetic control method? A guide to cutting-edge methods of causal inference

**安慰剂检验的程序-------------

forval i=1/39{

qui synth cigsale retprice cigsale(1988) cigsale(1980) cigsale(1975), ///

xperiod(1980(1)1988) trunit(`i') trperiod(1989) keep(synth_`i', replace)

}       //对所有39个州分别进行SCM(把39个州分别作为政策影响组)

forval i=1/39{

use synth_`i', clear

rename _time years

gen tr_effect_`i' = _Y_treated - _Y_synthetic

keep years tr_effect_`i'

drop if missing(years)

save synth_`i', replace

}               //得到SCM的政策效应

use synth_1, clear

forval i=2/39{

qui merge 1:1 years using synth_`i', nogenerate

}                    //把所有39个政策效应合并起来

local lp

forval i=1/39 {

  local lp `lp' line tr_effect_`i' years, lcolor(gs12) ||

  twoway `lp' || line tr_effect_3 years,  lcolor(orange) legend(off)   xline(1989, lpattern(dash))

}                        //直接画图就好

①I am not a member of the econometric circle community. If you need the complete do file and data of this article, please enter the econometric circle community first and contact to obtain it.

②It is a group of friends in the econometric community. If you need the complete do file and data of this article, please forward this article to the circle of friends and contact the backstage of the official account to get it.

The causal inference research team reserves the right

Guess you like

Origin blog.51cto.com/15057855/2679985