Causal Inference | Supplementary Notes on Double Difference Method

After changing to a new environment, I have been adapting (in fact, I have been pushed away), so I stopped taking notes for a long time. This weekend is finally a little free, of course because of the epidemic, I can’t go anywhere, haha, so I’m here to make a fuss~

I sorted out the recent homework of pre, and shared some notes of the double difference method, hoping to encourage and be useful~

This blog is a supplement to the study notes of the previous blog, the double difference method. It mainly supplements the introduction of a classic paper and some ways to solve the violation of parallel trends in the DID process.

Introduction and reproduction of classic DID papers

Moser and Voena published a paper on AER in 2012 based on the classic DID method to test the impact of intellectual property restrictions on innovation. Articles and recurrence, focusing on recurrence.

Although the author released the data and programs, some of the graphs are actually implemented through office, so in this edition, I implemented them through the stata program. Through this reproduction, the fear of stata drawing is also reduced. .

Thesis content

[Econometric Circle] The official account also introduced this paper, the most classic literature on learning DID method from AER, which was not one of them at that time

If you don’t have time to study the text of the paper, you can quickly browse it to get a general idea of ​​the research, but I strongly recommend you to study it, because the writing ideas and logic of the paper are very smooth and rigorous~

Abstract:
Compulsory licensing allows firms in developing countries to produce foreign-owned inventions without the consent of the foreign patent owner ( background of the paper ). This article uses the exogenous event of compulsory licensing under the Trading with the Enemy Act after World War I to examine the impact of compulsory licensing on domestic inventions ( research methodology ). A differential analysis of nearly 130,000 chemical inventions shows that compulsory licensing increases domestic inventions by 20% ( study finds ).

insert image description here

Conclusion:
This article uses TWEA as a natural experiment to investigate whether compulsory licensing encourages nationals to invent in emerging industries.

Data on chemical patents by U.S. inventors after TWEA show that compulsory licensing has a strong and persistent positive effect on domestic invention. In USPTO subclasses where at least one adversary-owned patent is licensed to a domestic firm under TWEA, domestic patents increased by about 20% after TWEA (compared to unaffected subclasses).

These results are robust to controlling for the number of licenses granted as well as accounting for the novelty of licensed patents. The results were also fit for various surrogate tests, including triple difference (comparing the change in the number of patents of US inventors before and after TWEA with the change in the number of patents of other non-German inventors), controls for subclass and treatment-specific time trends, and Placebo tests with other non-German inventors.

ITT and instrumental variable regressions further suggest that the analysis may underestimate rather than overestimate the true impact of licensing.

The historical nature of the data also allows us to examine the timing of these effects. Estimates of annual treatment effects suggest that licensing began around 1929 (as measured by patent filings) and continued into the 1930s. Compulsory licenses give US companies the right to produce German inventions, but even with rights to confiscated patents and, in some cases, physical capital, it will take years for US companies to acquire the knowledge needed to produce these inventions domestically and skills. Our data suggest that American inventions began after this long period of study. These findings are reflected in changing patterns of scientific citations, which suggest that the US chemical industry gained prominence in the 1930s as the originator of knowledge. The difficult learning process experienced by U.S. firms after TWEA demonstrates that human capital and tacit knowledge are critical to facilitating rapid technology transfer between countries.

paper reproduction

The paper recurrence data can be downloaded from this website (may need to register an account)
https://www.openicpsr.org/openicpsr/project/112497/version/V1/view

Interpretation of Essay Variables

*-变量解释
use chem_patents_maindataset,replace

**年份定义
label var grntyr "发生年份"

**专利分类定义
label var uspto_class "7248个专利子类"
label var main "19个专利主类"
label var subcl "7248个专利子类"
label var class_id "7248个专利子类的编码"
label var licensed_class "是被没收的专利许可类别,是则为1"
//label var confiscated_class "有被没收的专利类别,是则为1"

**没收的专利许可
label var count_cl "该子类被没收的专利许可数量,最多为15个"
label var count_cl_2 "该子类被没收的专利许可数量的平方,最多为15个"
label var year_conf "该子类被没收的专利许可的总剩余时间"
label var year_conf_2 "该子类被没收的专利许可的总剩余时间的平方"
label var treat "处理组,该子类中至少有1个没收的专利许可,则为1"

**专利授予数量
label var count_usa "该年美国人专利授予的数量"
label var count_france "该年法国人专利授予的数量"
label var count_germany "该年德国人专利授予的数量"
label var count "该年所有国家专利授予年份的数量"
label var count_for "该年所有外国国家专利授予的数量(区间1875-1939)"
label var count_for_2 "该年(1877-)所有外国国家专利授予的数量(区间1877-1939)"
label var count_for_noger "该年所有外国非德国国家专利授予年份的数量"

save chem_patents_maindataset,replace

benchmark regression

*-Table 2: 基准回归
use chem_patents_maindataset,replace

**变量
// 因变量:count_usa "该年美国人专利授予的数量"
// 处理变量1:treat "处理组,该子类中至少有1个没收的专利许可,则为1"
// 处理变量2:count_cl "该子类被没收的专利许可数量,最多为15个"
// 处理变量3:year_conf "该子类被没收的专利许可的总剩余时间"
// 控制变量1:count_for "该年所有外国国家专利授予的数量(区间1875-1939)"
// 控制变量2:count_cl_2 "该子类被没收的专利许可数量的平方,最多为15个"
// 控制变量3:year_conf_2 "该子类被没收的专利许可的总剩余时间的平方"
// 固定效应:年度固定效应(grntyr)、专利子类固定效应(class_id)

**变量标签
label var count_usa "Patents by US inventors"
label var treat "Subclass has at least one license"
label var count_cl "Number of licenses"
label var count_cl_2 "Number of licenses squared"
label var year_conf "Remaining lifetime of licensed patents"
label var year_conf_2 "Remaining lifetime of licensed patents squared(×100)"
label var count_for "Number of patents by foreign inventors"

**回归检验
*** 处理变量1:treat
reghdfe count_usa treat count_for, absorb(grntyr class_id) vce(cluster class_id) keepsingletons 
est store m1

reghdfe count_usa treat, absorb(grntyr class_id) vce(cluster class_id) keepsingletons 
est store m2

***处理变量2:count_cl 
reghdfe count_usa count_cl count_cl_2 count_for, absorb(grntyr class_id) vce(cluster class_id) keepsingletons 
est store m3

reghdfe count_usa count_cl count_for, absorb(grntyr class_id) vce(cluster class_id) keepsingletons 
est store m4

reghdfe count_usa count_cl, absorb(grntyr class_id) vce(cluster class_id) keepsingletons 
est store m5

***处理变量3:year_conf
reghdfe count_usa year_conf year_conf_2 count_for, absorb(grntyr class_id) vce(cluster class_id) keepsingletons 
est store m6

reghdfe count_usa year_conf count_for, absorb(grntyr class_id) vce(cluster class_id) keepsingletons 
est store m7

reghdfe count_usa year_conf, absorb(grntyr class_id) vce(cluster class_id) keepsingletons 
est store m8

**输出结果
outreg2 [m1 m2 m3 m4 m5 m6 m7 m8] using Table2.xls, ///
		tstat adjr2 nocons dec(3) label replace ///
		keep(treat count_cl count_cl_2 year_conf year_conf_2 count_for) ///
		sortvar(treat count_cl count_cl_2 year_conf year_conf_2 count_for) ///
		title("Table2") ctitle(" ")  ///
		addtext(Subclass fixed effects,Yes,Year fixed effects,Yes, ///
				Number of subclasses,7248)

parallel trend test

In fact, here, I do not understand the author's way of testing parallel trends. The method here seems to be different from the traditional multiplication of time and policy. The author looks at the pre-event time trend changes (influence coefficients) of the treatment group and the control group respectively, and judges that they meet the parallel trend by seeing that their influence trends are similar. suppose

*-Figure 4 比较处理和控制组的预处理趋势
use chem_patents_maindataset,replace

**生成处理组和控制组变量
forvalues x=1875/1939 {
    
    
	gen untreat_`x'=1 if licensed==0 & grn==`x' // 控制组
	replace untreat_`x'=0 if untreat_`x'==.
	gen treat_`x'=1 if licensed==1 & grn==`x'  // 处理组
	replace treat_`x'=0 if treat_`x'==.
	}
		
**回归
drop *treat_1900
xtreg count_usa treat_* untreat_*, fe i(class_id) robust cluster(class_id)

**存储处理组系数
sort uspto_class grntyr
gen coef_treat = . 
gen se_treat = . 
***存储回归系数和标准误
local j = 1 
foreach var of var treat_* {
    
    
	replace coef_treat = _b[`var'] in `j'
	replace se_treat = _se[`var'] in `j'
	local ++j // 第二个样本、第三个样本......
}
***生成置信区间上下界
gen up_ci_treat = coef_treat + 1.96* se_treat
gen low_ci_treat = coef_treat - 1.96*se_treat

**存储控制组系数
sort uspto_class grntyr
gen coef_control = . 
gen se_control = . 
***存储回归系数和标准误
local j = 1 
foreach var of var untreat_* {
    
    
	replace coef_control = _b[`var'] in `j'
	replace se_control = _se[`var'] in `j'
	local ++j 
}
***生成置信区间上下界
gen up_ci_control = coef_control + 1.96* se_control
gen low_ci_control = coef_control - 1.96*se_control

**输出预处理趋势图
keep in 1/45 // 保留前45行数据(1875-1919)
rename grntyr Year
replace Year = Year+1 if Year>=1900
drop if Year == 1920
insobs 1
replace Year == 1900 if Year==.
tsset Year


tw  (line coef_treat Year if Year<1900, yline(0) lwidth(medthin) lcolor(navy) ///
	ytitle("Coefficients for year dummies")	xtitle("")) ///
	(line coef_treat Year if Year>1900, yline(0) lwidth(medthin) lcolor(navy) ///
	ytitle("Coefficients for year dummies")	xtitle("")) ///
	(line up_ci_treat Year, lwidth(medthin) lcolor(navy) lp(dash) cmissing(n)) ///
	(line low_ci_treat Year, lwidth(medthin) lcolor(navy) lp(dash) cmissing(n)) ///	
	(line coef_control Year if Year<1900, yline(0) lwidth(medthin)  lp(solid) lcolor(maroon)) ///
	(line coef_control Year if Year>1900, yline(0) lwidth(medthin)  lp(solid) lcolor(maroon)) ///
	(line up_ci_control Year, lwidth(medthin) lcolor(maroon) lp(dash) cmissing(n)) ///
	(line low_ci_control Year, lwidth(medthin) lcolor(maroon) lp(dash) cmissing(n) ///		
	legend(order(1 "Treated subclasses" 5 "Untreated subclasses") ///
	  pos(6) col(2) size(small))) , ///
	yline(0, lwidth(vthin) lpattern(dash)) ///
	xlabel(1875(5)1919, labsize(small))	

gr export "Figure4.pdf",replace	

insert image description here

Policy Dynamic Treatment Effects

*-Figure 678 处理组的年度处理效应
use chem_patents_maindataset,replace


**生出处理变量
// 因变量:count_usa "该年美国人专利授予的数量"
// 处理变量1:treat "处理组,该子类中至少有1个没收的专利许可,则为1"
// 处理变量2:count_cl "该子类被没收的专利许可数量,最多为15个"
// 处理变量3:year_conf "该子类被没收的专利许可的总剩余时间"
// 控制变量1:count_for "该年所有外国国家专利授予的数量(区间1875-1939)"

forvalues x=1876/1939 {
    
    
	gen td_`x'=0
	qui replace td_`x'=1 if grn==`x'
	}

foreach var in treat count_cl year_conf {
    
    
forvalues x=1919/1939 {
    
    
	cap gen `var'_`x'=`var' if grn==`x'
	qui replace `var'_`x'=0 if grn!=`x'
	}
}

**Figure 6: 处理变量1 treat
preserve
***回归
xtreg count_usa treat_* count_for td*, fe i(class_id) robust cluster(class_id)
***生成存储系数
sort uspto_class grntyr
gen coef = . 
gen se = . 
***存储回归系数和标准误
local j = 1 
foreach var of var treat_* {
    
    
	replace coef = _b[`var'] in `j'
	replace se = _se[`var'] in `j'
	local ++j // 第二个样本、第三个样本......
}
***生成置信区间上下界
gen up_ci = coef + 1.96* se
gen low_ci = coef - 1.96*se
**输出效应图
keep in 1/21 // 保留前21行数据(1919-1939)
rename grntyr Year
replace Year = Year+44

tw  (line coef Year, yline(0) lwidth(medthin) lcolor(navy) ///
	ytitle("Annual treatment effect") xtitle("") ///
	title("Treat: Subclass has at least one license")) ///
	(line up_ci Year, lwidth(medthin) lcolor(navy) lp(dash)) ///
	(line low_ci Year, lwidth(medthin) lcolor(navy) lp(dash) ///
	legend(off)), yline(0, lwidth(vthin) lpattern(dash)) ///
	xlabel(1919(5)1939, labsize(small))		

gr export "Figure6.pdf",replace	
restore	
	
	
**Figure 7: 处理变量2 count_cl
preserve
***回归
xtreg count_usa count_cl_1919-count_cl_1939 count_for td*, fe i(class_id) robust cluster(class_id)
***生成存储系数
sort uspto_class grntyr
gen coef = . 
gen se = . 
***存储回归系数和标准误
local j = 1 
foreach var of var count_cl_1919-count_cl_1939 {
    
    
	replace coef = _b[`var'] in `j'
	replace se = _se[`var'] in `j'
	local ++j // 第二个样本、第三个样本......
}
***生成置信区间上下界
gen up_ci = coef + 1.96* se
gen low_ci = coef - 1.96*se
**输出效应图
keep in 1/21 // 保留前21行数据(1919-1939)
rename grntyr Year
replace Year = Year+44

tw  (line coef Year, yline(0) lwidth(medthin) lcolor(navy) ///
	ytitle("Annual treatment effect") xtitle("") ///
	title("Treat: Number of licenses")) ///
	(line up_ci Year, lwidth(medthin) lcolor(navy) lp(dash)) ///
	(line low_ci Year, lwidth(medthin) lcolor(navy) lp(dash) ///
	legend(off)), yline(0, lwidth(vthin) lpattern(dash)) ///
	xlabel(1919(5)1939, labsize(small))		

gr export "Figure7.pdf",replace	
restore	
	
	
**Figure 8 - 处理变量3 year_conf
preserve
***回归
xtreg count_usa year_conf_1919-year_conf_1939 count_for td*, fe i(class_id) robust cluster(class_id)
***生成存储系数
sort uspto_class grntyr
gen coef = . 
gen se = . 
***存储回归系数和标准误
local j = 1 
foreach var of var year_conf_1919-year_conf_1939 {
    
    
	replace coef = _b[`var'] in `j'
	replace se = _se[`var'] in `j'
	local ++j // 第二个样本、第三个样本......
}
***生成置信区间上下界
gen up_ci = coef + 1.96* se
gen low_ci = coef - 1.96*se
**输出效应图
keep in 1/21 // 保留前21行数据(1919-1939)
rename grntyr Year
replace Year = Year+44

tw  (line coef Year, yline(0) lwidth(medthin) lcolor(navy) ///
	ytitle("Annual treatment effect") xtitle("") ///
	title("Treat: Remaining lifetime of licensed patents")) ///
	(line up_ci Year, lwidth(medthin) lcolor(navy) lp(dash)) ///
	(line low_ci Year, lwidth(medthin) lcolor(navy) lp(dash) ///
	legend(off)), yline(0, lwidth(vthin) lpattern(dash)) ///
	xlabel(1919(5)1939, labsize(small))	

gr export "Figure8.pdf",replace	
restore	

insert image description here

Triple Differential Control Disturbance Factors

*- Figure9: 三重差分检验
use "fig10.dta", clear
**回归
xtreg y usa_treat_td1919-usa_treat_td1939 usa_td* usa_treat treat_td* usa td_*, fe i(class_id) robust cluster(class_id)

**生成存储系数
gen year = _n in 1/21 
replace year = year+1918
gen coef = . 
gen se = . 

***存储回归系数和标准误
local j = 1 
foreach var of var usa_treat_td* {
    
    
	replace coef = _b[`var'] in `j'
	replace se = _se[`var'] in `j'
	local ++j // 第二个样本、第三个样本......
}
***生成置信区间上下界
gen up_ci = coef + 1.96* se
gen low_ci = coef - 1.96*se
**输出效应图
keep in 1/21 // 保留前21行数据(1919-1939)

tw  (line coef year, yline(0) lwidth(medthin) lcolor(navy) ///
	ytitle("Annual treatment effect: Trriple difference") xtitle("")) ///
	(line up_ci year, lwidth(medthin) lcolor(navy) lp(dash)) ///
	(line low_ci year, lwidth(medthin) lcolor(navy) lp(dash) ///
	legend(off)), yline(0, lwidth(vthin) lpattern(dash)) ///
	xlabel(1919(5)1939, labsize(small))	

gr export "Figure9.pdf",replace	

Robustness check (partial)

  • Placebo Tests: Constructing Sham Treatment Samples
*- Figure10: 安慰剂检验(稳健性检验)
use chem_patents_maindataset, clear
**生成变量
forvalues x=1876/1939 {
    
    
	gen td_`x'=0
	qui replace td_`x'=1 if grn==`x'
	}

foreach var in treat {
    
    
forvalues x=1919/1939 {
    
    
	cap gen `var'_`x'=`var' if grn==`x'
	qui replace `var'_`x'=0 if grn!=`x'
	}
}

forvalues x=1919/1939 {
    
    
	cap gen untreat_`x'= 1 if licensed==0 & grn==`x' 
	qui replace untreat_`x'=0 if untreat_`x'==.
}

**处理组回归
xtreg count_france treat_* td*, fe i(class_id) robust cluster(class_id)

**生成存储系数变量
gen year = _n in 1/21 
replace year = year+1918
gen coef_treat = . 
gen se_treat = . 

***存储回归系数和标准误
local j = 1 
foreach var of var treat_* {
    
    
	replace coef = _b[`var'] in `j'
	replace se = _se[`var'] in `j'
	local ++j // 第二个样本、第三个样本......
}
***生成置信区间上下界
gen up_ci_treat = coef_treat + 1.96* se_treat
gen low_ci_treat = coef_treat - 1.96*se_treat


**控制组回归
xtreg count_france untreat_* td*, fe i(class_id) robust cluster(class_id)

**生成存储系数变量
gen coef_control = . 
gen se_control = . 

***存储回归系数和标准误
local j = 1 
foreach var of var untreat_* {
    
    
	replace coef_control = _b[`var'] in `j'
	replace se_control = _se[`var'] in `j'
	local ++j // 第二个样本、第三个样本......
}
***生成置信区间上下界
gen up_ci_control = coef_control + 1.96* se_control
gen low_ci_control = coef_control - 1.96*se_control

**输出效应图
keep in 1/21 // 保留前21行数据(1919-1939)

tw  (line coef_treat year, yline(0) lwidth(medthin) lcolor(navy) ///
	ytitle("Coefficients for year dummies")	xtitle("")) ///
	(line up_ci_treat year, lwidth(medthin) lcolor(navy) lp(dash)) ///
	(line low_ci_treat year, lwidth(medthin) lcolor(navy) lp(dash)) ///	
	(line coef_control year, yline(0) lwidth(medthin)  lp(solid) lcolor(maroon)) ///
	(line up_ci_control year, lwidth(medthin) lcolor(maroon) lp(dash)) ///
	(line low_ci_control year, lwidth(medthin) lcolor(maroon) lp(dash) ///		
	legend(order(1 "Treated subclasses" 4 "Untreated subclasses") ///
	  pos(6) col(2) size(small))) , ///
	yline(0, lwidth(vthin) lpattern(dash)) ///
	xlabel(1919(5)1939, labsize(small))		
	
gr export "Figure10.pdf",replace	
  • Remove noise samples
*-Table 6: 删除新创建的子类(稳健性检验)
use chem_patents_maindataset,replace

**删除样本
sort uspto_class grn
bys uspto: gen ccc=sum(count)
foreach var in count_usa count  {
    
    
	qui replace `var'=. if ccc==0 
	}
gen aaa=1 if ccc==0 & grn==1919
bys uspto: egen bbb=max(aaa)
drop if bbb==1
drop if ccc==0
drop aaa bbb ccc

**变量标签
label var count_usa "Patents by US inventors"
label var treat "Subclass has at least one license"
label var count_cl "Number of licenses"
label var count_cl_2 "Number of licenses squared"
label var year_conf "Remaining lifetime of licensed patents"
label var year_conf_2 "Remaining lifetime of licensed patents squared(×100)"
label var count_for "Number of patents by foreign inventors"

**回归检验
*** 处理变量1:treat
reghdfe count_usa treat count_for, absorb(grntyr class_id) vce(cluster class_id) keepsingletons 
est store m1

reghdfe count_usa treat, absorb(grntyr class_id) vce(cluster class_id) keepsingletons 
est store m2

***处理变量2:count_cl 
reghdfe count_usa count_cl count_cl_2 count_for, absorb(grntyr class_id) vce(cluster class_id) keepsingletons 
est store m3

reghdfe count_usa count_cl count_for, absorb(grntyr class_id) vce(cluster class_id) keepsingletons 
est store m4

reghdfe count_usa count_cl, absorb(grntyr class_id) vce(cluster class_id) keepsingletons 
est store m5

***处理变量3:year_conf
reghdfe count_usa year_conf year_conf_2 count_for, absorb(grntyr class_id) vce(cluster class_id) keepsingletons 
est store m6

reghdfe count_usa year_conf count_for, absorb(grntyr class_id) vce(cluster class_id) keepsingletons 
est store m7

reghdfe count_usa year_conf, absorb(grntyr class_id) vce(cluster class_id) keepsingletons 
est store m8

**输出结果
outreg2 [m1 m2 m3 m4 m5 m6 m7 m8] using Table6.xls, ///
		tstat adjr2 nocons dec(3) label replace ///
		keep(treat count_cl count_cl_2 year_conf year_conf_2 count_for) ///
		sortvar(treat count_cl count_cl_2 year_conf year_conf_2 count_for) ///
		title("Table6") ctitle(" ")  ///
		addtext(Subclass fixed effects,Yes,Year fixed effects,Yes, ///
				Number of subclasses,7248)

Handling of Parallel Trend Violations

Parallel trends hypothesis: The treatment group and the control group have the same difference in the before and after changes when the policy does not occur. We test by the fact that the treatment and control groups must have a common trend of change before the policy is implemented .

Reason for violation: The control group and the treatment group are not random when selecting the entry group

solution

insert image description here

triple difference method

introduce

insert image description here

accomplish

*变量定义
// y:结果变量
// treat: 二元处理变量(1:被处理;0:被控制、未处理)
// time: 二元实验期变量(1:实验之后;0:实验之前)
// trend:三重差分变量(1:受干扰因素影响;0:不受干扰因素影响)
// $z:协变量,影响结果变量y,但不影响处理变量treat


*-diff命令 
//help diff

**【没有协变量】的三重差分法
diff y, t(treat) p(time) ddd(trend)  // 常规
diff y, t(treat) p(time) ddd(trend) robust  // 稳健性标准误
diff y, t(treat) p(time) ddd(trend) cluster(id)  // 聚类标准误
diff y, t(treat) p(time) ddd(trend) bs reps(200)  // 参数和标准误采用bootstrap估计


**【有协变量】的三重差分法
diff y, t(treat) p(time) ddd(trend) cov($z)  // 常规
diff y, t(treat) p(time) ddd(trend) cov($z)  robust  // 稳健性标准误
diff y, t(treat) p(time) ddd(trend) cov($z)  cluster(id)  // 聚类标准误
diff y, t(treat) p(time) ddd(trend) cov($z)  bs reps(200)  // 参数和标准误采用bootstrap估计


*-OLS命令 (加入交乘项)
//help reghdfe or help reg or help xtreg

reg y treat time trend c.treat#c.time c.treat#c.trend ///
			c.time#c.trend c.treat#c.time#c.trend  // 常规
			
reg y treat time trend c.treat#c.time c.treat#c.trend ///
			c.time#c.trend c.treat#c.time#c.trend, robust  // 稳健性标准误
			
reg y treat time trend c.treat#c.time c.treat#c.trend ///
			c.time#c.trend c.treat#c.time#c.trend, cluster(id)  // 聚类标准误
			
reg y treat time trend c.treat#c.time c.treat#c.trend ///
			c.time#c.trend c.treat#c.time#c.trend $z, cluster(id)  // 聚类标准误+控制协变量


synthetic control method

introduce

Although the synthetic control method is only applicable to the situation with few treatment groups, recently scholars have combined SCM and DID, which can be applied to the situation with many treatment groups. Teacher Lian Yujun also introduced at the meeting
insert image description here

The classic paper on the application of the SCM method is Abadie (2010) based on the paper passed by the California Tobacco Act. You can also learn from it. He released the data and procedures.

Abadie-2010-Synthetic Control Methods for Comparative Case Studies
insert image description here

accomplish

Take Abadie's (2010) paper as an example

*变量定义
// y:结果变量
// treat: 二元处理变量(1:被处理;0:被控制、未处理)
// time: 二元实验期变量(1:实验之后;0:实验之前)
// $z:协变量,影响结果变量y,但不影响处理变量treat


*命令安装
ssc install synth
ssc install synth2 // 更加常用且成熟


*-synth2语法
synth2 depvar indepvars, trunit(#) trperiod(#) [options]

**必选项
// depvar: 结果变量
// indepvars: 预测变量
// trunit(#): 表示 treated unit,用于指定处理地区
// trperiod(#):表示 treated period,用于指定政策干预开始的时期

**选择项
// ctrlunit(numlist)用于捐助池的Ctrlunit (numlist)控制单元
// preperiod(numlist)  干预发生前的预处理阶段
// postperiod(numlist)   干预发生时及之后的治疗后时期
// Xperiod (numlist) indepvar中指定的预测器的平均周期
// mspeperiod(numlist)均方预测误差(MSPE)应最小化的周期
// customV(numlist)提供自定义V-Weights,确定变量在预处理期间对结果的预测能力
// nested 在所有(对角)正半定v矩阵和w权集之间进行搜索的嵌套全嵌套优化过程
// allopt 如果指定了嵌套,将获得完全健壮的结果
// placebo([{
    
    unit|unit(numlist)} period(numlist) cutoff(#_c)]) 使用假治疗单位的空间安慰剂试验和/或使用假治疗时间的时间安慰剂试验 


*-案例介绍(California smoking)
global path = ""
cd $path

use "california", clear
xtset state year

**生成加州在政策实施后的变量
gen treat=0
replace treat=1 if year > 1989 & state==3  // 加州编号为3,在1989实施控烟法案

**可视化样本的分布
panelview cigsale treat, i(state) t(year) type(treat)
panelview cigsale treat, i(state) t(year) type(outcome) prepost


**合成控制法实现
// 4个预测变量:lnincome age15to24 retprice beer
// 3个预测香烟销售时间点:cigsale(1988) cigsale(1980) cigsale(1975)*分别表示人均香烟消费在197519801988年的取值  
// trunit(3): 指定处理地区加州(编号为3// trperiod(1989): 指定政策干预开始的时期,为1989
// xperiod(1980(1)1988): 预测政策处理前的周期
// placebo:使用假处理单位的空间安慰剂试验,删除该样本后对模型进行拟合,然后检验该时段内的 ATT 是否显著不为零,原假设是为0。

synth2 cigsale lnincome age15to24 retprice beer ///
	cigsale(1988) cigsale(1980) cigsale(1975), ///
	trunit(3) trperiod(1989) xperiod(1980(1)1988) placebo(unit cut(2))


graph display eff_pboUnit // 展示图像

propensity score matching

introduce

The propensity score matching method has been used in conjunction with DID in recent years, and it is also a method to solve the violation of the parallel trend assumption. But there are some caveats in using
insert image description here

insert image description here
insert image description here
insert image description here

accomplish

Take the papers published by Mr. Yu Minggui on China's Industrial Economics as an example

Yu Minggui et al. China's Industrial Policy and Enterprise Technology Innovation[J]. China Industrial Economy, 2016(12).

## 单期匹配
*-数据处理
use 余明桂-2018,replace

**变量定义
keep if ingroup !=.
rename flnpat110 lnPatent
keep lnPatent ingroup inyear Size Lev Roa PPE Capital Cash Age Gdpr year indc

***协变量
global xlist Size Lev Roa PPE Capital Cash Age Gdpr  // 定义协变量
label var Size "企业规模"
label var Lev "资产负债率"
label var Roa "资产收益率"
label var Capital "企业资本性支出"
label var PPE "企业固定资产规模"
label var Age "企业年龄"
label var Cash "企业现金量"
label var Gdpr "所在地区GDP增长率"

***主要变量
label var lnPatent "专利产出"  //结果变量
label var ingroup  "被十五和十一五鼓励的产业" // 处理变量
label var inyear "十一五政策实施" // 时间变量


save psmdata,replace

*回归检验
use psmdata,replace

**ps-score计算-采用卡尺最近邻匹配(1:2)
psmatch2 ingroup $xlist, outcome(lnPatent) logit ///
		 neighbor(1) ties common ate caliper(0.05)

**有效性检验
***平衡性检验
pstest, both graph saving(balancing_assumption, replace)
graph export "balancing_assumption.emf", replace
***共同支撑假设
psgraph, saving(common_support, replace)
graph export "common_support.emf", replace


**倾向得分值的核密度图
sum _pscore if ingroup == 1, detail

***匹配前
sum _pscore if ingroup == 0, detail
twoway(kdensity _pscore if ingroup == 1, lpattern(solid)       				///
			lcolor(black) lwidth(thin) scheme(qleanmono)       				///
			ytitle("{stSans:核}""{stSans:密}""{stSans:度}",   				///
				size(medlarge) orientation(h))               				///
			xtitle("{stSans:匹配前的倾向得分值}",            				///
				size(medlarge))                          					///
			xline(0.6777 , lpattern(solid) lcolor(black))   				///
			xline(`r(mean)', lpattern(dash)  lcolor(black))   				///
		saving(kensity_cs_before, replace)) 								///
		(kdensity _pscore if ingroup == 0, lpattern(dash)),  				///
			xlabel(  , labsize(medlarge) format(%02.1f))    	 			///
			ylabel(0(1)4, labsize(medlarge))     							///
			legend(label(1 "{stSans:处理组}")  								///
            label(2 "{stSans:控制组}")       								///
            size(medlarge) position(1) symxsize(10))

graph export "kensity_cs_before.emf", replace

discard

***匹配后
sum _pscore if ingroup == 0 & _weight != ., detail
twoway(kdensity _pscore if ingroup == 1, lpattern(solid)                     ///
			lcolor(black)   lwidth(thin)   scheme(qleanmono)				 ///
			ytitle("{stSans:核}""{stSans:密}""{stSans:度}",                	 ///
					size(medlarge) orientation(h))                           ///
			xtitle("{stSans:匹配后的倾向得分值}",                            ///
                    size(medlarge))                                          ///
			xline(0.6777, lpattern(solid) lcolor(black))                     ///
			xline(`r(mean)', lpattern(dash)  lcolor(black))                  ///
	saving(kensity_cs_after, replace))                             			 ///
	(kdensity _pscore if ingroup == 0 & _weight !=., lpattern(dash)),       ///
			xlabel(, labsize(medlarge) format(%02.1f))                       ///
			ylabel(0(1)4, labsize(medlarge))                               	///
			legend(label(1 "{stSans:处理组}")                              	///
			label(2 "{stSans:控制组}")                                      ///
            size(medlarge) position(1) symxsize(10))

graph export "kensity_cs_after.emf", replace


**回归结果对比
***DID
reghdfe lnPatent c.ingroup##c.inyear $var, absorb(year indc) vce(robust)
est store m1

***PSM-DID
reghdfe lnPatent c.ingroup##c.inyear $var if _weight!=., absorb(year indc) vce(robust)
est store m2

***结果输出
local mlist_1 "m1 m2"
reg2docx `mlist_1' using 回归结果对比1.docx, b(%6.4f) t(%6.4f)       		 ///
         scalars(N r2_a(%6.4f)) noconstant  replace                          ///
         mtitles("DID" "PSM-DID")  title("DID及截面PSM-DID结果")
# 多期匹配
*-逐年psm-did
**事前描述性统计和ps-score计算
***采用卡尺最近邻匹配(1:2)
use psmdata,replace
forvalue i = 2001/2010{
    
    
      preserve
          capture {
    
    
              keep if year == `i'
              set seed 0000
              gen norvar_2 = rnormal()
              sort norvar_2
              psmatch2 ingroup $xlist, outcome(lnPatent) logit neighbor(2)  ///
                                        ties common ate caliper(0.05)
              save `i'.dta, replace
              }
      restore
      }

clear all

use 2001.dta, clear

forvalue k =2002/2009 {
    
    
      capture {
    
    
          append using `k'.dta
          }
      }
	  
save yby_psmdata.dta, replace


**回归结果对比
use yby_psmdata,replace
***DID
reghdfe lnPatent c.ingroup##c.inyear $var, absorb(year indc) vce(robust)
est store m1

***逐年PSM-DID
reghdfe lnPatent c.ingroup##c.inyear $var if _weight!=., absorb(year indc) vce(robust)
est store m2

***结果输出
local mlist_1 "m1 m2"
reg2docx `mlist_1' using 回归结果对比2.docx, b(%6.4f) t(%6.4f)       		 ///
         scalars(N r2_a(%6.4f)) noconstant  replace                          ///
         mtitles("DID" "PSM-DID")  title("DID及逐年PSM-DID结果")

Summarize

A very crude summary. This paper only talks about the processing of classic DID, but now the paper uses more stagger DID, and when using the double fixed effect method to estimate stagger DID, in fact, due to the heterogeneity of the processing effect, the estimated coefficient is different Bias, so in recent years, scholars have also provided some methods to overcome it. There is no introduction in this blog, I hope I can talk about it when I have time in the future (too lazy, I don’t want to write now)~
insert image description here

insert image description here

Finally, I set up a small flag that I can’t go out recently. I hope that I can travel and relax at the end of the year (going home is also counted...), and see the great rivers and mountains of the motherland~ Because I have seen lights and rainbows, so I have to work harder to maintain the confidence to see the sun. , haha~

Guess you like

Origin blog.csdn.net/Claire_chen_jia/article/details/128065470