A brief introduction of generalized matrix method (GMM) and dynamic panel data

This article partially refers to the article by Zhang Haiyang from the School of Finance of the University of International Business and Economics https://max.book118.com/html/2018/1024/6133101022001224.shtm, and he refers to the explanatory materials written by the author of Roodman (2009) Central Plains, Therefore, what Mr. Zhang said has a certain degree of authority. This article is based on the author’s own understanding of what Mr. Zhang said. It is slightly easier to understand. In fact, it may have the same meaning, but Mr. Zhang’s formula will be more and more rigorous. .

Here is a guide or summary:
According to my understanding (Wei Yanzhao's note), the GMM method is actually a method for data regression to estimate as unbiased as possible.
For data regression in some cases, using some traditional methods (such as difference method + OLS model/fixed effects model), the estimated equation coefficients are obviously biased, and the possible deviations are not necessarily small, because the difference method is used In such methods, the covariance of the independent variable (lag term) and the error term in the difference equation is not equal to 0, which leads to deviations when the OLS method is used (it doesn’t matter if you don’t understand this passage).
Therefore, there is the GMM method, called the generalized matrix method. The main basis of this method is to make the covariance of the independent variable and the error term in the equation 0 (independent of each other). Based on this basis, an unbiased estimate of the equation coefficient or Estimation as unbiased as possible (in some cases, unbiased estimates are obtained, in other cases as unbiased estimates as possible, and the so-called Hansen and Sargan tests can be performed. These two test methods are based on the above-mentioned The coefficients of the equation are measured to measure whether the above-mentioned covariance is arranged as uniformly as possible around 0 in the statistical law. If it is, the confidence interval of this estimate is considered okay).

So under what circumstances will the traditional model of calculating regression coefficients fail?
The answer is when facing dynamic panel data.
What is dynamic panel data?
(1) Dynamic, the model includes the lag term of the dependent variable;
(2) There are individual fixed effects;
(3) The error term εit other than the fixed effects can be heteroscedastic and can be serially correlated;
(4) There can be some The independent variable is endogenous;
(5) The error terms εit and εjt between different individuals will not be correlated;
(6) There may be predetermined but not completely exogenous variables;
(7) "Large N, Small T" means that the number of individuals should be enough, but the time should not be too long. If the time is long enough, the dynamic panel error will not be too large, just use the fixed effect.
(These contents about dynamic panel data are extracted from Mr. Zhang's article https://max.book118.com/html/2018/1024/6133101022001224.shtm)
The key one of the above features is Article (1)(2), The model includes the lagged term of the dependent variable and the individual fixed effect, because the fixed effect is not a critical number in the calculation, and it is an unknown number at the same time, so it is often offset. However, when the first difference method is used to offset, the error term and the independent variable (lag term) Cov are not equal to 0, which leads to the failure of OLS and fixed-effect models.
Combined with the definition and characteristics of GMM, it is not difficult to conclude that it is appropriate to use GMM to analyze dynamic panel data.

The formula should be listed here, and I won’t repeat it for time reasons, just refer to the following, the formula is very detailed: https://max.book118.com/html/2018/1024/6133101022001224.shtm

Guess you like

Origin blog.csdn.net/nvsirgn/article/details/108654451