Empirical paper results reproduction + stata code explanation

PS: I am a pure novice who wrote a graduation thesis in my senior year. This article is just a pure record of my thoughts on learning other people's thesis, not professional! ! !

Table of contents

benchmark regression

xtset

xtreg

estab


benchmark regression

Article and code source: China Industrial Economics "Digitalization of Tax Collection and Management and Internal Salary Gap in Enterprises"


xtset

xtset code year//定义面板数据

xtset defines panel data

code is the interface

year is time series


xtreg

xtreg gap gtp i.year i.ind,fe vce(cluster code)
est store m1

xtreg fixed effects model.

gap is the dependent variable

gtp is an independent variable

i.year, i, ind represent dummy variables

fe means fixed effect fixed effect

vce (cluster code) indicates that the standard error is adjusted by clustering at the company level

Combining with the literature, should the fixed effects of the year and the fixed effects of the enterprise and industry be controlled at this time? The article does not say what year\ind means, the guess should be the year and industry

Interpretation of the results:

number of obs is the number of samples

Within\between\overall in R2 indicates the goodness of fit within the group, between groups and overall. Generally, the fixed effect is within the group, and the random effect is between the groups.

Obs per group means each group of observations?

corr(u_i,xb) is the correlation coefficient between the individual effect u_i and the fitted value. If the individual effect is regarded as a fixed factor that does not change over time, such as personal consumption habits, national social system, regional characteristics, gender, etc., the corresponding model is called a "fixed effect" model. If the individual effect is regarded as a random factor, that is, the individual effect is set as a part of the interference item, it is called a "random effect" model. The random effect model strictly requires that the individual effect is not correlated with the explanatory variables, that is, corr(u_i,xb)=0, so the fixed effect model is used in this paper.

F(55,3365) 55 is the number of parameters k, and 3365 is nk?

Prob > F=0.0000 because the null hypothesis of the F test is that the coefficients in front of each variable are zero. So the null hypothesis is rejected and the model is accepted.

Coefficient _

Robust std. err. refers to the cluster adjustment standard error.

What is the adjustment of the introduction reference link Cluster Robust Standard Error? , the article has a thorough understanding of the misunderstanding of the clustering adjustment standard, and those with a measurement basis can be healthy.

The t value is generally used to test individual partial regression coefficients. p>|t| indicates that a small probability event has occurred, and the null hypothesis is rejected. H0 hypothesis: the coefficient in front of the variable is zero.

95% Conf. Interval refers to the confidence interval Confidence Interval 95%.


xtreg gap gtp size lev roa labor age cash indratio top1 soe i.year i.ind ,fe vce(cluster code)
est store m2

On the basis of m1, control variables are added to control the enterprise level (size lev roa labor age cash indratio top1 soe)


xtreg gap gtp size lev roa labor age cash indratio top1 soe olddep avgwage lnpgdp i.year i.ind i.prov,fe vce(cluster code)
est store m3

m3 further controls province fixed effects and regional economic development characteristic variables (olddep avgwage lnpgdp)

 It is found that there is a collinearity problem, and I don't know how to do it at present? Ask my guide next time (●'◡'●)

estab

esttab m1 m2 m3 using 基准回归.rtf,scalar(N r2_a) drop(*.year *.prov *.ind) compress star(* 0.1 ** 0.05 *** 0.01) mtitles nogap b(%6.4f) t(%6.4f)

esttab m1 m2 m3: display the specified regression results

scalars (r2_a N) means to display the adjusted R2 and sample size in the table

drop delete

compress makes the result more compact

star(* 0.1 ** 0.05 *** 0.01) Mark the significance level with a small star

mtitles(titlelist) specifies the titles of the model in the table titles

The nogap command causes blank lines between two arguments to be removed

b(%6.4f) floating-point number output format, which means that no matter how many digits the result has, the output result occupies at least six tab characters, that is, six positions, if it is not enough, fill it with spaces, and it can exceed (these 6 positions), And retain four decimal places.

Guess you like

Origin blog.csdn.net/weixin_50381726/article/details/128203428