Dynamic panel case analysis

Dynamic panel model analysis

If in the panel model, the explanatory variables include the lagged value of the explained variable, it is called a "dynamic panel model" and its purpose is to deal with the endogeneity problem. The development of dynamic panel models is divided into three stages. The first stage is the difference GMM proposed by Arellano and Bond (1991), the second stage is the horizontal GMM proposed by Arellano and Bover (1995), and the third stage is Blundell and Bond (1998) combined differential GMM and horizontal GMM to estimate GMM, which is system GMM (System GMM). SPSSAU currently provides two types of differential GMM and system GMM by default. In most cases, the system GMM method is used. It should be noted that dynamic panel models usually only target panel data such as 'big N and small T'. If T is too large, this will lead to many lag terms, and the parameter values ​​to be estimated may be too many to fit, etc.

Dynamic panel models usually involve several variables, which are explained as follows:

When the system GMM selects the lagged term of the explained variable as the explanatory variable, one principle is to use it until the period it is still significant. For example, if a 2-period lag term of the explained variable is used as an explanatory variable, and both lag terms are significant, but after adding a 3-period lag term, the second period is not significant, and the 1st and 2nd periods are still significant, then generally only the lag term is used. The 1st and 2nd periods are used as explanatory variables, and the 3rd period cannot be used. And you must use Phase 1 and Phase 2, not just Phase 1 or Phase 2. Because the results show that both period 1 and period 2 are significant, if only period 1 or period 2 is used, omitted variables will be artificially caused.

System GMM has a large space for choice when selecting the lagged terms of the explained variable and the explanatory variable as IV. As long as it meets the two tests of the system GMM. The two tests of system GMM are the Hansen over-identification test and the disturbance term no autocorrelation AR test. The Hansen over-identification test studies whether the instrumental variables are all exogenous variables. If its corresponding p value is greater than 0.05, it means that the instrumental variables are all exogenous . , at the same time, it is also necessary to pass the AR test, which tests whether the disturbance term has no autocorrelation. Generally speaking, if the p value corresponding to AR(2) is >0.05, accepting the null hypothesis means that the model passes the autocorrelation test. When building a dynamic panel model, the setting of instrumental variable parameters is particularly complicated, but in any case, it needs to pass Hansen's over-identification test and AR test to mean that the model is usable. Therefore, it is recommended that SPSSAU automatically configure parameters in actual research, that is, when setting parameters Let the system automatically identify and find the best model. When SPSSAU cannot automatically find the best model, it can set and adjust parameters one by one based on its own data and professional actual conditions.

1  background

There is currently a panel data on wages of 595 American workers from 1976 to 1982 (N=595, T=7). The data involved include the explained variable: logarithm of wages; and 11 explanatory variables, namely: length of service, job Number of weeks, whether it is blue collar, whether it is a manufacturing job, whether it is in the South of the United States, whether it lives in a big city, whether it is married, whether it is a woman, whether it is a union, years of education, whether it is black, etc. Some data are as follows:

2  Theory

Dynamic panel data is a method to deal with endogeneity problems. The lag term of the explained variable is included in the model as an explanatory variable, and instrumental variables are set for estimation. The setting of the instrumental variables is more complicated, but special attention needs to be paid to two tests. , respectively Hansen over-identification test and disturbance term no autocorrelation AR test.

3  operations

The operation screenshots in this example are as follows, using the system GMM method and the GMM difference using the OneStep method. In the figure below, 'Compress instrumental variables' refers to compressing and reducing the number of instrumental variables. When there are too many research variables and there may be many instrumental variables, this option can be selected. In this case, this option is not selected for the time being.

Regarding 'setting data', first of all, in the dynamic panel, the lag order of the explained variable will be used as the explanatory variable, so how many lag orders is it? This can be set by yourself (for example, set to lag 1~lag 2), of course, it can also be 'intelligent identification', that is, the system will automatically run different lag order parameter values, and combine the results of Hansen over-identification test and AR test, Finally determine the optimal lag order. At the same time, the lag order of the explanatory variable can also be used as an explanatory variable, but this is rarely the case in actual research. Therefore, the lag order of the explanatory variable is from order 0 to 0, that is, its lag order will not be included in the model. middle.

In addition, the next step is the setting of instrumental variables. The setting of instrumental variables is more complicated. By default, as many lag orders as possible are used as instrumental variables. During actual research, you can modify the settings yourself, but one thing to note is that, It is recommended to let the system default to include all lag-able orders into the model, and let the system judge whether it passes the Hansen over-identification test and AR test. If it passes, it is OK. If it does not pass, it is recommended to set them one by one based on professional knowledge.

Finally, in the figure below, 'time term dummy variables are put into the model', which means that the time term is processed into dummy variables and included in the model. This item is not selected by default. You can select it if necessary.

4  SPSSAU output results

The dynamic panel model outputs a total of 5 tables, the descriptions are as follows:

5 Text Analysis

The table above displays model information, including GMM type, OneStep or TwoStep difference, information criterion aic value, bic value, hqic value, etc. And display the model code. In the model code, L represents the lag order, for example, L1 represents the next order. Gmm represents the gmm-type instrumental variable. '1:.' in the brackets represents the lag from 1st order to all orders. If it is an IV-type instrumental variable, the format is iv (analysis item). If there is a compressed instrumental variable, there will be a 'collapse' parameter. value. The above table displays information such as the number of instrumental variables, total sample size, time term, number of groups, etc. It is affected by parameters such as the lag order of the instrumental variable and whether to compress the instrumental variable, and it is not particularly meaningful.

If the lag order of the explained variable in the parameters is 'intelligent recognition' (or the lag order of the explanatory variable is also included in the model, there is also 'intelligent recognition' at this time, but usually the explanatory variables are less and the lag order is included in the model), the system Multiple models will be run and compared, and the optimal model output will be selected. The standard is the optimal model that passes the Hansen over-identification test and AR test. The table above shows that the intelligent recognition lag order is successful, which means that the current model passes the Hansen over-identification test and AR test.

The table above shows the results of the dynamic panel model, which is consistent with the ordinary regression interpretation. First check whether an item is significant. If it is significant, then check the regression coefficient Coef value. If it is greater than 0, it means a positive impact, and if it is less than 0 indicates a negative impact. In addition, in this case, the lag 1 and lag 2 of the explained variables were included in the model, and both of them showed significance. At the same time, both blue-collar and manufacturing jobs are significant. The regression coefficient of blue-collar is -0.069<0, which means that blue-collar jobs are relatively low, and the regression of manufacturing jobs is The coefficient value is 0.053>0, indicating that workers working in the manufacturing industry have relatively higher jobs.

For the Hansen over-identification test, the null hypothesis is that the instrumental variables are not related to the error term. From the table above, we can see that the model rejects the HanSen over-identification test (p =0.992>=0.05), which means that the instrumental variables are not related to the error term, indicating that the current model Well built.

The original hypothesis of the AR root test is that there is no autocorrelation in the model. It is usually tested against AR(2). If the p value corresponding to AR(2) is >0.05, accepting the null hypothesis means that the model passes the autocorrelation test. Otherwise, reject it. The null hypothesis means that the model has autocorrelation. From the table above, we can see that the model accepts the AR(2) test ( p =0.063>=0.05), which means that there is no autocorrelation in the model, indicating that the current model is well constructed.

In addition, SPSSAU also outputs a dynamic panel model result table in a simplified format for direct use in reports.

6  Analysis

Dynamic panel model analysis involves the following key points, as follows:

  • The dynamic panel model is suitable for data structures with large N and small T. If the time term in the panel data is too long, such as 30 years of data, this will easily lead to a very large lag order of instrumental variables and a very large number of instrumental variables, which will affect the model construction; If T is long, it is recommended to use an ordinary panel model for research;
  • When setting the lag order of the explained variable, it is recommended to select 'Intelligent Identification' to allow the system to automatically run different lag orders to compare and select the optimal model;
  • In the setting of instrumental variables, the default is lag 1 order to all orders, so the number of instrumental variables will be very large. It is recommended to choose to compress parameters such as the number of instrumental variables, or to set the lag order based on your own professional knowledge.

Guess you like

Origin blog.csdn.net/m0_37228052/article/details/133016911