PS: I am a pure novice who wrote a graduation thesis in my senior year. This article is just a pure record of my thoughts on learning other people's thesis, not professional! ! !
Table of contents
benchmark regression
Article and code source: China Industrial Economics "Digitalization of Tax Collection and Management and Internal Salary Gap in Enterprises"
xtset
xtset code year//定义面板数据
xtset defines panel data
code is the interface
year is time series
xtreg
xtreg gap gtp i.year i.ind,fe vce(cluster code)
est store m1
xtreg fixed effects model.
gap is the dependent variable
gtp is an independent variable
i.year, i, ind represent dummy variables
fe means fixed effect fixed effect
vce (cluster code) indicates that the standard error is adjusted by clustering at the company level
Combining with the literature, should the fixed effects of the year and the fixed effects of the enterprise and industry be controlled at this time? The article does not say what year\ind means, the guess should be the year and industry
Interpretation of the results:
number of obs is the number of samples
Within\between\overall in R2 indicates the goodness of fit within the group, between groups and overall. Generally, the fixed effect is within the group, and the random effect is between the groups.
Obs per group means each group of observations?
corr(u_i,xb) is the correlation coefficient between the individual effect u_i and the fitted value. If the individual effect is regarded as a fixed factor that does not change over time, such as personal consumption habits, national social system, regional characteristics, gender, etc., the corresponding model is called a "fixed effect" model. If the individual effect is regarded as a random factor, that is, the individual effect is set as a part of the interference item, it is called a "random effect" model. The random effect model strictly requires that the individual effect is not correlated with the explanatory variables, that is, corr(u_i,xb)=0, so the fixed effect model is used in this paper.
F(55,3365) 55 is the number of parameters k, and 3365 is nk?
Prob > F=0.0000 because the null hypothesis of the F test is that the coefficients in front of each variable are zero. So the null hypothesis is rejected and the model is accepted.
Coefficient _
Robust std. err. refers to the cluster adjustment standard error.
What is the adjustment of the introduction reference link Cluster Robust Standard Error? , the article has a thorough understanding of the misunderstanding of the clustering adjustment standard, and those with a measurement basis can be healthy.
The t value is generally used to test individual partial regression coefficients. p>|t| indicates that a small probability event has occurred, and the null hypothesis is rejected. H0 hypothesis: the coefficient in front of the variable is zero.
95% Conf. Interval refers to the confidence interval Confidence Interval 95%.
xtreg gap gtp size lev roa labor age cash indratio top1 soe i.year i.ind ,fe vce(cluster code)
est store m2
On the basis of m1, control variables are added to control the enterprise level (size lev roa labor age cash indratio top1 soe)
xtreg gap gtp size lev roa labor age cash indratio top1 soe olddep avgwage lnpgdp i.year i.ind i.prov,fe vce(cluster code)
est store m3
m3 further controls province fixed effects and regional economic development characteristic variables (olddep avgwage lnpgdp)
It is found that there is a collinearity problem, and I don't know how to do it at present? Ask my guide next time (●'◡'●)
estab
esttab m1 m2 m3 using 基准回归.rtf,scalar(N r2_a) drop(*.year *.prov *.ind) compress star(* 0.1 ** 0.05 *** 0.01) mtitles nogap b(%6.4f) t(%6.4f)
esttab m1 m2 m3: display the specified regression results
scalars (r2_a N) means to display the adjusted R2 and sample size in the table
drop delete
compress makes the result more compact
star(* 0.1 ** 0.05 *** 0.01) Mark the significance level with a small star
mtitles(titlelist) specifies the titles of the model in the table titles
The nogap command causes blank lines between two arguments to be removed
b(%6.4f) floating-point number output format, which means that no matter how many digits the result has, the output result occupies at least six tab characters, that is, six positions, if it is not enough, fill it with spaces, and it can exceed (these 6 positions), And retain four decimal places.