An encyclopedic guide to space measurement, this is the only one to master this unique cheat

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

Compensated contributions can be made to the econometrics economy circle, and measurement related can be

Email: [email protected]

All do files, micro-databases and various software of the econometric circle methodology are placed in the community. Welcome to the spatial metrology research group for exchanges and visits. If you want a complete do file and data set, please see the post.

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

Today, our "Spatial Metrology Research Group" will recommend a complete set of methods and related instructions for space metrology to circle friends in the econometric circle. Previously, our circle had recommended the introductory articles on spatial measurement, "Spatial effect model selection, estimation, weighting, test (Spatial effect)" and "The latest progress and theoretical framework of spatial econometrics, comprehensive track". Our "Spatial Metrology Research Group" has not been established for a long time, and may not be as prestigious as the "Causal Inference Research Group" in the econometric circle. However, the rapid development of spatial econometrics has attracted many friends at home and abroad to join our research team. If you are interested in spatial measurement, you can consider joining our research group, provided that you have the most basic knowledge.

What the hell is space metering?

Spatial econometrics is also called spatial econometrics. In fact, it is a series regression that adds some spatial effects to the methods we usually use. Spatial effect is essentially a kind of network effect, which proves that everything is related. Of course, the closer two objects are, the more likely they are to have a strong connection. Otherwise, you would really believe that a Latin American butterfly vibrates its wings a few times, and then there must be an earthquake or flood in China. Think about it again. Back then, the United States was preparing to attack North Korea. Because we knew the truth of "lips and teeth were cold", China sent so many volunteers across the Yalu River to assist Lao Jin's family. The current physical distance may not be as good as the economic distance. For example, we are so close neighbors with South Korea, but South Korea and the United States may be closer economically and militarily (if not, please ignore this statement).

Give two examples to look at the use of spatial measurement in economic affairs. The development of Beijing will affect the development of Hebei and Tianjin through the neighbourhood effect. For example, the development of Hebei and Tianjin is relatively lagging behind (popular intuition). If we want to study the impact of the government's "purchase restriction policy" on housing prices, then we can collect several years of panel data from cities that have introduced purchase restriction policies across the country, and then do a normal xtreg regression. However, we must know that the purchase restriction policies introduced by various cities will not only affect the housing prices in the local city, but also affect the housing prices in another city through such as "population mobility". Moreover, the housing prices in this city may also directly affect the housing prices in another city. House price. Therefore, we need to consider such a continuous and chaotic network relationship, so that the impact of the purchase restriction policy on housing prices can be divided into two parts: direct impact and indirect impact.

Spatial econometrics can handle cross-sectional data, panel count data, and variable endogeneity (currently, the type of processing is limited). Spatial measurement is a relatively newly developed discipline, and many measurement theories and methods are still being explored. Therefore, the following part mainly focuses on a few more used spatial measurement models.

Spatial measurement regression data

For spatial measurement regression, the first thing we need to find is a shapefile, a file package that can hold various geographic information such as latitude and longitude. When we get a shapefile, we need to use spshape2dta to convert it into a data format that can be recognized by our current software. We need to know that there are only some data related to geographic information, so we first need to generate a unique identifier for a place-its ID, so that it can be combined with other economic and social data that we will associate later. For example, we want to understand the relationship between average education level and economic development through county-level data in Guangdong. Then, we need to find the shapefiles of each county in Guangdong, and then let each county have a unique ID to identify it (usually based on city + county). Then, we merge the economic and social statistical yearbook database of the Guangdong county area with the above shapefile one by one according to the unique ID. The resulting database can be used as a sample for spatial metrology analysis.

The following is a shapefile data, which only has the geographic information of the latitude, longitude and name of each place. Therefore, we first need to convert this information into usable data as needed (mentioned above).

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

Below, we directly use some examples to understand the spatial measurement methods that will be recommended today.

Cross-section spatial measurement regression

The next one is the cross-sectional data, _ID is the unique identity that identifies this observation, _CX, _CY are the longitude and latitude of the point in the shapefile, cname, sname are the county and state names, respectively. When _ID=8, this point represents the observed value in Menard County, Texas, so you can think of these as "someone" you are used to. Therefore, we can also use spatial measurement to study neighborhood effects or network effects or peer effects. After all, these models have spillover effects.

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

Earlier, we have said that it is necessary to identify a county (state code + county code) through a unique ID (we use fips below). Then, use this ID to merge the database we are interested in with the shapefile one by one. The following database is the merged cross-section database (other variables not related to our theme have been removed). In the next example, we only keep three variables: college (the college graduation rate in the county), income (the county's average income), and unemployment (the county's unemployment rate). Next, what we want to study is the relationship between the county's college graduation rate and the county's unemployment rate, that is, whether the college graduation rate reduces the unemployment rate-the more college students in the county, the lower the unemployment rate.

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

Since what we want to study is the level of unemployment, we simply display the level of unemployment in all counties in the database through the map (the darker the color, the higher the unemployment rate).

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

The following picture shows the level of college graduation rates in each county (the darker the color, the higher the college graduation rate). Can you see if the color level is highly repetitive with the picture above?

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

We first do an ordinary OLS regression, ignoring the spatial correlation. From this result, we can see that when income is controlled, the college graduation rate does reduce the unemployment rate in a place (the college coefficient is negative).

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

We want to see now, is there any spatial correlation in this regression? Therefore, we consider using Moran test to test it. The test results show that there is a strong spatial effect in our regression, because it rejects H0: error is iid.

An encyclopedic guide to space measurement, this is the only one to master this unique cheat
Before doing Moran test, we actually need to create a spatial weighting matrix first. There are many ways to create SWM, but we usually use two methods: contiguity and inverse distance (the type of translation is different, you just need to understand it yourself). The following is the SWM we created with the neighbor method. What you can see is that most of the SWM is 0, indicating that there is no neighbor relationship between the two counties, so the variables of each other do not affect the other party. These can be automatically executed by the program. The following is the output of the SWM result.
An encyclopedic guide to space measurement, this is the only one to master this unique cheat

There are two algorithms for spatial econometric regression, they are generalized spatial two-stage least squares method (gs2sls) and maximum likelihood estimation method (ml). The main difference between the two: the latter is more effective when the error follows a normal distribution, otherwise it is not as robust as gs2sls (for example, in the case of heteroscedasticity). gs2sls is essentially the GMM estimation that we are accustomed to, find various moment conditions and then perform optimization operations to find the estimated value of arg minimization.

The following is the spatial autoregressive model we made with gs2sls, that is, we put the dependent variable unemployment of other counties as the independent variable of this county into the model, which is a bit like the Autoregressive regression (AR model) we usually encounter. The wald test of the yellow line shows that our spatial effect does exist, and the spatial term is significant, so spatial regression should be used.

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

The following is the spatial autoregressive model we obtained using maximum likelihood estimation. If we directly compare the coefficients obtained by these two methods, we can still see some differences.
An encyclopedic guide to space measurement, this is the only one to master this unique cheat

The following is the result of kneading all three model categories commonly used in spatial measurement. Above, we only used the spatial autoregressive model, that is, the regression of the unemployment of other counties in our county as an independent variable. The following allows three forms of spatial dependence (essentially corresponding to three different types of spatial measurement models): 1. College (spatial lag of independent variables) in other counties, 2. Unemployment in other counties (spatial lag of dependent variable), that is, spatial autocorrelation model, 3. The correlation of error, that is, spatial lag of error, which is a bit like the moving average model (MA model) that we are accustomed to.

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

Note that we generally do not use the above coefficients to report the results. After all, we can't interpret the direct and indirect effects at all, because there will be recursive problems. Think about it, A affects B, B affects C, A also affects C, C affects B, and C also affects A. We can't directly distinguish the various effects here. Therefore, we use the following figure to obtain the average effect size of each variable. For example, for every 1 percentage point increase in the college graduation rate, the unemployment rate will fall by 0.16%, and the indirect effect of the college graduation rate will account for most of it (calculate it yourself, 0.1 divides 0.16), so we have to consider the spatial spillover effect. .

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

Based on the above regression model, let's change the regression form slightly, that is, add another spatial weight matrix SWM (two yellow dots represent two different spatial weight matrices). W is the contiguity matrix, and M is the inverse distance matrix. That is to say, in the spatial regression model, we can also add different spatial weight matrices at the same time, and then use these spatial weight matrices to make models such as spatial autoregression or spatial error regression respectively.

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

By obtaining the average effect, we can find that now, for every 1% increase in our college graduation rate, the unemployment rate will drop by 0.20%, which is greater than the effect of the previous result. How to choose specifically, we think that you can put both of these in the research, anyway, the final result is a negative number, which is in line with the hypothesis of your research.

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

Endogenous problems of spatial measurement

We will discuss the endogeneity issue next, and will also use an example to see how to deal with the endogeneity issue in spatial measurement. We want to study whether the arrest rate related to drinking will be affected by the number of police officers in a locality-a large number of police officers in a locality may lead to an increase or decrease in the arrest rate related to drinking. Below is the distribution of arrest rates related to drinking in these counties in the southern United States.

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

Variables related to the arrest rate (dui) related to drinking are the number of police officers (police), the arrest rate of other types (nondui), the number of vehicles (vehicles), and whether alcohol is prohibited (dry). However, we must know that the arrest rate related to drinking is a two-way causal relationship with the number of police officers. If the previous arrest rate is high, the number of police officers should of course be increased, that is, police from (to) dui. Therefore, we use the instrumental variable "whether there is a county election" as the instrumental variable of police (in line with the correlation and exogenous conditions).

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

Through the average effect diagram, we can find that after dealing with the endogenous problem through instrumental variables, the increase in the number of police officers reduces the rate of arrests related to drinking in a place. This is a strong proof that if a place wants to reduce the arrest rate, it needs to increase the number of police officers. This conclusion seems to be at odds with our intuition, but it is consistent with China's reality. In China, where there are more police officers, relatively speaking, the rate of arrests related to drinking is much lower (you can disagree).

Panel space measurement regression

The following is a panel data, panel is _ID number, year is the four periods of 1960-1990, so (_ID=876, year=1960) represents the observation value of Hancock County, West Virginia in 1960. For the benefits of panel data, please take a look at this article "How does panel data deal with endogeneity, an article that makes people suddenly clear", mainly to help us identify the trend of an observed value, and better deal with those that are not. Confounding effects of observation factors.

Next, we want to study whether the gini coefficient (gini) in the county directly affects the suicide rate (hrate) in the county. After all, a large gap between the rich and the poor in a place will intuitively cause people to have psychological problems (for example, depression). Of course, we add the county's population density (ln_density) and population size (ln_population) as control variables, because they will affect the suicide rate in a place.

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

Let's do a normal random effects regression first. As can be seen from the following, an increase in the gini coefficient will lead to an increase in the suicide rate in a place (the coefficient in front of the gini is positive). After adding the year effect, we found that compared with 1960, the suicide rates in 1970, 1980, and 1990 were higher. sigma_u: panel-level standard deviation; sigma_e: standard deviation of epsilon_it.

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

The following is the spatial weight matrix SWM (WX) that we created through the neighbor weight matrix. Note that the year = SWM created by the data of all counties in 1990 is used here, because we think that the neighboring counties in 1990 were in 1960 and 1970. He should still be a neighbor in 1980 (of course there are exceptions, you need to find exceptions).

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

Here is how we use the spatial regression model to do it, which includes spatial lag of dependent variable (hrate) and spatial lag of error (e.hrate) (if you don’t know the meaning of these two in English, please see the previous part of the article explanation of). We still got a similar conclusion that an increase in the gini coefficient will lead to an increase in the suicide rate.

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

As we said before, since the above coefficient does not tell us the direct and indirect effects of gini, it is necessary to read the relevant information through the average effect table. From the picture below, we can see that for every 1 percentage point increase in the gini coefficient, the number of suicides per 100,000 people in the county will increase by 46.85. At this time, the direct effect is similar to the indirect effect, indicating that the suicide rate in other counties actually affects the suicide rate in our county (this needs attention). As for the specific delivery mechanism, it is "the public says the public is reasonable, the mother-in-law says the mother is reasonable", "but who is right" does not seem to be our concern.

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

Earlier, we found that the suicide rates in the three periods of 1970, 1980, and 1990 were higher than those in 1960, which made us interested in the topic of "whether the gini coefficient has an increased influence on the suicide rate over time". The picture below shows that we used year to make a cross term for the gini coefficient. After taking 1960 as the base period, it seems that the influence of the gini coefficient on hrate has increased dramatically.

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

We can see from the figure below that the cross term of year and gini has a significant impact on the suicide rate.

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

Below, we use four average effect maps displayed by year to see if the influence of the gini coefficient on suicide rates is gradually increasing.

This is followed by the average effect of gini on suicide rate in 1960, which is 54.9.
An encyclopedic guide to space measurement, this is the only one to master this unique cheat

The following is the average effect of the gini coefficient on the suicide rate in 1970, which is 75.4.

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

The following is the average effect of the gini coefficient on the suicide rate in 1980, which is 131.5.

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

The following is the average effect of the gini coefficient on the suicide rate in 1990, which is 231.8.

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

Through the above four average effect diagrams, we can get: The influence of the gini coefficient on the suicide rate is increasing day by day.

When errorlag appears, sarpanel is added to allow individual effects (in this case, county effects) to appear in a spatial regression form like the error term, that is, the ai we often say also appears in a spatial regression form like error. In this data set, as a whole, when we put the individual effects into the spatial regression, the results obtained are similar to the previous spatial regression without considering the individual effects.

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

The following shows the fixed effect regression results of spatial panel data. The estimation methods of fixed effect estimator and random effect estimator are different (not detailed here). The re estimation that we ran earlier uses the maximum likelihood estimation method, so it takes more time to run, and when we use fe, the time to run is much shorter (this is related to the algorithm used). The conclusion we can draw from this is that an increase in the gini coefficient will lead to an increase in the suicide rate, which is consistent with the previous regression results.

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

By looking at the results of the average effect diagram below, we find that the total effect of the gini coefficient on the suicide rate is 37.7. This shows that for every 1 percentage point increase in the gini coefficient, the number of suicides per 100,000 people in the county will increase by 37.7. This is smaller than the effect obtained by the random spatial panel effect method.

An encyclopedic guide to space measurement, this is the only one to master this unique cheat

This is the great method of space measurement that our space measurement research team wants to recommend to everyone in the econometrics circle. If you still want to get a further study in space measurement, please don't hesitate to join our research team. In the future, our team will recommend other estimation methods related to spatial measurement. The way to join the "Spatial Metrology Research Group" is placed in the group announcement of the "Econometric Circle Community" (Little Goose Community). This group is only open to members of the community.

After negotiating with the econometric circle, our research team recommends the following do file and data acquisition methods (partners provide support to the group or circle, and we will give them more support, everything is mutual):

①It is a partner of the econometric community. If the number of times you share information in the Xiaoge community is >=3 (good information is good), then please take a screenshot and send the shared information to the backstage of the official account to get it (unify and send patiently) point).

②You are a partner of the econometric community, but if you don’t meet the above conditions, you can get the complete do file and data of this article: open the latest in the econometric community (Little Goose Community) Select the post "Encyclopedia-style usage guide do file and data for spatial measurement", he can let you get the information.

③Not a partner of the econometric community. If you need the complete do file and data of this article, please enter the econometric community (below), and then contact to obtain the relevant data and do file.

Guess you like

Origin blog.51cto.com/15057855/2679986